icl-utk-edu / slate

SLATE is a distributed, GPU-accelerated, dense linear algebra library targetting current and upcoming high-performance computing (HPC) systems. It is developed as part of the U.S. Department of Energy Exascale Computing Project (ECP).
https://icl.utk.edu/slate/
BSD 3-Clause "New" or "Revised" License
91 stars 21 forks source link

LU info #111

Closed mgates3 closed 1 year ago

mgates3 commented 1 year ago

[Depends on #115]

Adds info error handling to LU and Aasen symmetric indefinite factorization and solves. Abbreviated output [outdated]:

slate/test> mpirun -np 4 ./tester --matrix rand,identity,one,zero,rand_zerocol0,rand_zerocol1.0,rand_zerocol1,rand_zerocol13,rand_zerocol0.5 --dim 137 --nb 32 --ib 8 getrf
% SLATE version 2023.08.25, id 61d7ba6c
% input: ./tester --matrix rand,identity,one,zero,rand_zerocol0,rand_zerocol1.0,rand_zerocol1,rand_zerocol13,rand_zerocol0.5 --dim 137 --nb 32 --ib 8 getrf
% 2023-09-16 22:00:32, 4 MPI ranks, CPU-only MPI, 4 OpenMP threads per MPI rank

A    m    n  nb  ib  p  q  la  pt     error  status  
1  137  137  32   8  2  2   1   2  5.10e-19  pass    
2  137  137  32   8  2  2   1   2  0.00e+00  pass    
3  137  137  32   8  2  2   1   2  7.25e-03  FAILED  info = 2  
4  137  137  32   8  2  2   1   2       inf  FAILED  info = 1  
5  137  137  32   8  2  2   1   2  6.14e-03  FAILED  info = 1  
6  137  137  32   8  2  2   1   2  6.16e-03  FAILED  info = 137  
7  137  137  32   8  2  2   1   2  6.24e-03  FAILED  info = 2  
8  137  137  32   8  2  2   1   2  6.48e-03  FAILED  info = 14  
9  137  137  32   8  2  2   1   2  6.38e-03  FAILED  info = 69  

% Matrix kinds:
%  1: rand, cond unknown
%  2: identity, cond = 1
%  3: one, cond = inf
%  4: zero, cond = inf
%  5: rand_zerocol0, cond = inf
%  6: rand_zerocol1.0, cond = inf
%  7: rand_zerocol1, cond = inf
%  8: rand_zerocol13, cond = inf
%  9: rand_zerocol0.5, cond = inf

% 7 tests FAILED: getrf

Currently, one inconsistency is zerocolN takes 0-based index N in [ 0, n-1 ], while returned info is 1-based index in [ 1, n ]. info = 0 is generally considered to mean "no error". For instance, above, rand_zerocol13 has info = 14.

I guess if info != 0, then it should be marked as "pass", since the routine is correctly catching the singularity. Also, the tester should inspect U and verify that U( i, i ) == 0. [Updated]

mgates3 commented 1 year ago

One bug that was discovered and fixed is that gbtrf and hetrf were calling internal::getrf_panel with pivot_threshold = max_panel_threads, max_panel_threads = priority_1, and priority = 0 (default value), due to pivot_threshold being added as an argument. The default values were removed to avoid this. Code change in gbtrf:

                 internal::getrf_panel<Target::HostTask>(
-                    A.sub(k, i_end-1, k, k), diag_len, ib, pivots.at(k),
-                    max_panel_threads, priority_1 );
+                    A.sub(k, i_end-1, k, k), diag_len, ib, pivots.at(k),
+                    pivot_threshold, max_panel_threads, priority_1, tag_0, &info );

(I moved pivots.at(k) up a line for clarity here.)

neil-lindquist commented 1 year ago

I was able to un-stick the deadlock by calling internal::getrf_panel in hetrf with max_panel_threads=1. It looks like the variable shift issue with the pivot threshold made it so that the master branch is setting max_panel_threads=priority_1=1. (See this minor change which results in a successful CI run)

But, I'm not sure why using multiple threads in hetrf's panel causes a deadlock. Presumably, it didn't before threshold pivoting was added and messed up what was being passed in for max_panel_threads.

mgates3 commented 1 year ago

gpu_nvidia error was out of memory, in function stream_create, most likely an error on the CI machine due to a user allocating the whole GPU. Needs rerun. ... Passing now.