icl-utk-edu / lapackpp

LAPACK++ is a C++ wrapper around CPU and GPU LAPACK and LAPACK-like linear algebra libraries, developed as part of the SLATE project.
https://icl.utk.edu/slate/
BSD 3-Clause "New" or "Revised" License
51 stars 14 forks source link

Add Device Implementation of `HEEVD` #42

Closed wavefunction91 closed 1 year ago

wavefunction91 commented 1 year ago

This change was required for tighter integration with NWChemEx.

Adds the following:

Opening early to coordinate if necessary. I have access to OLCF to add rocSolver, but I'd have to update my credentials with ALCF to get access to Intel/oneMKL HW. Also, I do not have access to a pre-11 CUDA installation, so the path for manual dispatch is untested. Unit tests on NVIDIA work locally and at NERSC

$ nvidia-smi
Fri Aug 25 16:27:53 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.105.01   Driver Version: 515.105.01   CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA RTX A5000    On   | 00000000:81:00.0 Off |                  Off |
| 30%   34C    P8    20W / 230W |     73MiB / 24564MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
$ ./tester  --type s,d,c,z --align 32 --dim 100:500:100 --jobz n,v --uplo l,u dev-heevd
LAPACK++ version 2023.06.00, id 082daf3
input: ./tester --type 's,d,c,z' --align 32 --dim '100:500:100' --jobz 'n,v' --uplo 'l,u' dev-heevd

type    uplo   jobz       n  align  device     error    error2   time (s)  ref time (s)  status  
test matrix A: rand, cond(S) = NA
   s   lower  novec     100     32       0        NA  3.11e-07   0.000832       0.00338  pass    
   s   lower  novec     200     32       0        NA  5.21e-07    0.00163       0.00887  pass    
   s   lower  novec     300     32       0        NA  5.03e-07    0.00258        0.0185  pass    
   s   lower  novec     400     32       0        NA  6.54e-07    0.00343        0.0172  pass    
   s   lower  novec     500     32       0        NA  5.67e-07    0.00433        0.0236  pass    
   s   lower    vec     100     32       0  5.33e-09  5.56e-07    0.00135        0.0114  pass    
   s   lower    vec     200     32       0  1.66e-09  4.06e-07    0.00196        0.0825  pass    
   s   lower    vec     300     32       0  2.07e-09  1.15e-06    0.00308         0.182  pass    
   s   lower    vec     400     32       0  1.64e-09  4.22e-07    0.00414         0.364  pass    
   s   lower    vec     500     32       0  1.33e-09  9.87e-07    0.00522         0.668  pass    
   s   upper  novec     100     32       0        NA  3.53e-07   0.000774      0.000851  pass    
   s   upper  novec     200     32       0        NA  6.53e-07    0.00164       0.00292  pass    
   s   upper  novec     300     32       0        NA  5.33e-07    0.00263        0.0265  pass    
   s   upper  novec     400     32       0        NA  5.33e-07    0.00348        0.0160  pass    
   s   upper  novec     500     32       0        NA  4.97e-07    0.00432        0.0255  pass    
   s   upper    vec     100     32       0  3.33e-09  3.11e-07    0.00698        0.0140  pass    
   s   upper    vec     200     32       0  2.07e-09  5.25e-07    0.00199        0.0695  pass    
   s   upper    vec     300     32       0  1.80e-09  1.15e-06    0.00323         0.179  pass    
   s   upper    vec     400     32       0  1.67e-09  4.51e-07    0.00418         0.364  pass    
   s   upper    vec     500     32       0  1.36e-09  5.80e-07    0.00526         0.666  pass    

   d   lower  novec     100     32       0        NA  7.81e-16    0.00378       0.00115  pass    
   d   lower  novec     200     32       0        NA  1.93e-15    0.00637       0.00603  pass    
   d   lower  novec     300     32       0        NA  1.17e-15    0.00941        0.0205  pass    
   d   lower  novec     400     32       0        NA  1.23e-15     0.0118        0.0377  pass    
   d   lower  novec     500     32       0        NA  1.52e-15     0.0162        0.0329  pass    
   d   lower    vec     100     32       0  9.08e-18  1.09e-15     0.0102        0.0209  pass    
   d   lower    vec     200     32       0  3.55e-18  1.70e-15    0.00709         0.106  pass    
   d   lower    vec     300     32       0  3.24e-18  1.09e-15     0.0106         0.219  pass    
   d   lower    vec     400     32       0  3.29e-18  9.95e-16     0.0139         0.420  pass    
   d   lower    vec     500     32       0  9.21e-19  1.04e-15     0.0173         0.771  pass    
   d   upper  novec     100     32       0        NA  8.21e-16    0.00376      0.000935  pass    
   d   upper  novec     200     32       0        NA  8.95e-16    0.00645       0.00356  pass    
   d   upper  novec     300     32       0        NA  1.03e-15    0.00944        0.0270  pass    
   d   upper  novec     400     32       0        NA  1.59e-15     0.0122        0.0217  pass    
   d   upper  novec     500     32       0        NA  9.22e-16     0.0143        0.0307  pass    
   d   upper    vec     100     32       0  2.23e-18  6.53e-16    0.00844        0.0312  pass    
   d   upper    vec     200     32       0  2.18e-18  9.50e-16    0.00719        0.0688  pass    
   d   upper    vec     300     32       0  3.46e-18  1.00e-15     0.0111         0.202  pass    
   d   upper    vec     400     32       0  1.28e-18  1.43e-15     0.0142         0.447  pass    
   d   upper    vec     500     32       0  8.42e-19  1.20e-15     0.0173         0.800  pass    

   c   lower  novec     100     32       0        NA  2.55e-07   0.000909       0.00151  pass    
   c   lower  novec     200     32       0        NA  4.40e-07    0.00182       0.00554  pass    
   c   lower  novec     300     32       0        NA  6.52e-07    0.00707        0.0362  pass    
   c   lower  novec     400     32       0        NA  7.08e-07    0.00380        0.0503  pass    
   c   lower  novec     500     32       0        NA  5.64e-07    0.00478        0.0610  pass    
   c   lower    vec     100     32       0  4.91e-09  3.61e-07    0.00152        0.0237  pass    
   c   lower    vec     200     32       0  2.86e-09  4.47e-07    0.00220         0.123  pass    
   c   lower    vec     300     32       0  1.46e-09  1.26e-06    0.00347         0.293  pass    
   c   lower    vec     400     32       0  1.67e-09  6.48e-07    0.00460         0.621  pass    
   c   lower    vec     500     32       0  1.31e-09  7.96e-07    0.00586         1.159  pass    
   c   upper  novec     100     32       0        NA  3.17e-07   0.000943       0.00125  pass    
   c   upper  novec     200     32       0        NA  4.96e-07    0.00187        0.0242  pass    
   c   upper  novec     300     32       0        NA  7.21e-07    0.00292        0.0158  pass    
   c   upper  novec     400     32       0        NA  5.97e-07    0.00387        0.0801  pass    
   c   upper  novec     500     32       0        NA  6.24e-07    0.00479        0.0693  pass    
   c   upper    vec     100     32       0  1.28e-09  2.53e-07    0.00154        0.0376  pass    
   c   upper    vec     200     32       0  1.10e-09  6.28e-07    0.00221         0.134  pass    
   c   upper    vec     300     32       0  1.64e-09  4.91e-07    0.00355         0.247  pass    
   c   upper    vec     400     32       0  1.05e-09  5.14e-07    0.00466         0.576  pass    
   c   upper    vec     500     32       0  1.30e-09  4.67e-07    0.00587         1.084  pass    

   z   lower  novec     100     32       0        NA  1.07e-15    0.00409       0.00165  pass    
   z   lower  novec     200     32       0        NA  1.23e-15    0.00801        0.0240  pass    
   z   lower  novec     300     32       0        NA  1.29e-15     0.0126        0.0391  pass    
   z   lower  novec     400     32       0        NA  1.36e-15     0.0170        0.0479  pass    
   z   lower  novec     500     32       0        NA  1.47e-15     0.0218        0.0553  pass    
   z   lower    vec     100     32       0  1.04e-17  5.77e-16    0.00560        0.0407  pass    
   z   lower    vec     200     32       0  4.95e-18  9.64e-16     0.0127         0.154  pass    
   z   lower    vec     300     32       0  3.53e-18  1.02e-15     0.0157         0.365  pass    
   z   lower    vec     400     32       0  2.22e-18  1.04e-15     0.0219         0.786  pass    
   z   lower    vec     500     32       0  1.53e-18  1.15e-15     0.0290         1.451  pass    
   z   upper  novec     100     32       0        NA  4.35e-16    0.00406       0.00148  pass    
   z   upper  novec     200     32       0        NA  1.16e-15    0.00805       0.00836  pass    
   z   upper  novec     300     32       0        NA  1.18e-15     0.0128        0.0249  pass    
   z   upper  novec     400     32       0        NA  8.89e-16     0.0172        0.0389  pass    
   z   upper  novec     500     32       0        NA  1.28e-15     0.0219        0.0540  pass    
   z   upper    vec     100     32       0  3.08e-18  7.00e-16    0.00569        0.0353  pass    
   z   upper    vec     200     32       0  5.35e-18  1.00e-15    0.00974         0.161  pass    
   z   upper    vec     300     32       0  2.13e-18  1.90e-15     0.0161         0.389  pass    
   z   upper    vec     400     32       0  8.77e-19  8.53e-16     0.0224         0.778  pass    
   z   upper    vec     500     32       0  2.03e-18  1.18e-15     0.0288         1.445  pass    
All tests passed for dev-heevd.
mgates3 commented 1 year ago

Did you add src/cuda/cuda_heevd.cc and test/test_heevd_device.cc?

wavefunction91 commented 1 year ago

Whoops, that would have been helpful. Fixed.

On Fri, Aug 25, 2023 at 6:34 PM Mark Gates @.***> wrote:

Did you add src/cuda/cuda_heevd.cc and test/test_heevd_device.cc?

— Reply to this email directly, view it on GitHub https://github.com/icl-utk-edu/lapackpp/pull/42#issuecomment-1694104485, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPLZPOIVRYZVCOIPZKYHOLXXFHBTANCNFSM6AAAAAA37DTZPA . You are receiving this because you authored the thread.Message ID: @.***>

mgates3 commented 1 year ago

@wavefunction91 Do you want us to add the ROCm and oneMKL implementations? That should be straightforward for us. In that case: @dsukkari Can you add ROCm? @ayarkhan Can you add SYCL? I'm not sure the best route to collaborate on a branch. Would it be easiest to pull the branch into the lapackpp repo and add commits there, with a new PR?

wavefunction91 commented 1 year ago

@mgates3 It would be better if you could add them. This branch should allow for edits by maintainers, i.e. you @dsukkari and @ayarkhan should all be able to push to it.

ayarkhan commented 1 year ago

For oneMKL/SYCL, the CMake build was tested and works. On sunspot, the quick tester.

./test/tester dev-heevd
LAPACK++ version 2023.06.00, id 260e406
¯input: ./test/tester dev-heevd

type    uplo   jobz       n  device     error    error2   time (s)  ref time (s)  status
test matrix A: rand, cond(S) = NA
   d   lower  novec     100       0        NA  0.00e+00      0.672         0.117  pass
   d   lower  novec     200       0        NA  0.00e+00     0.0100       0.00377  pass
   d   lower  novec     300       0        NA  0.00e+00    0.00889       0.00999  pass
   d   lower  novec     400       0        NA  0.00e+00     0.0222        0.0196  pass
   d   lower  novec     500       0        NA  0.00e+00     0.0376        0.0348  pass
All tests passed for dev-heevd.

More detailed testing

$ ./tester  --type s,d,c,z --align 32 --dim 100:500:100 --jobz n,v --uplo l,u dev-heevd
LAPACK++ version 2023.06.00, id 260e406
input: ./tester --type 's,d,c,z' --align 32 --dim '100:500:100' --jobz 'n,v' --uplo 'l,u' dev-heevd

type    uplo   jobz       n  align  device     error    error2   time (s)  ref time (s)  status
test matrix A: rand, cond(S) = NA
   s   lower  novec     100     32       0        NA  0.00e+00      0.487         0.121  pass
   s   lower  novec     200     32       0        NA  0.00e+00     0.0307        0.0170  pass
   s   lower  novec     300     32       0        NA  0.00e+00     0.0371        0.0285  pass
   s   lower  novec     400     32       0        NA  0.00e+00     0.0297        0.0285  pass
   s   lower  novec     500     32       0        NA  0.00e+00     0.0470        0.0295  pass
   s   lower    vec     100     32       0  4.66e-09  7.72e-07      0.244         0.147  pass
   s   lower    vec     200     32       0  4.09e-09  9.99e-07     0.0928        0.0219  pass
   s   lower    vec     300     32       0  9.41e-10  7.01e-07    0.00948        0.0264  pass
   s   lower    vec     400     32       0  1.19e-09  7.10e-07     0.0163        0.0444  pass
   s   lower    vec     500     32       0  6.51e-10  2.39e-06     0.0256         0.115  pass
   s   upper  novec     100     32       0        NA  0.00e+00   0.000984      0.000439  pass
   s   upper  novec     200     32       0        NA  0.00e+00    0.00278       0.00268  pass
   s   upper  novec     300     32       0        NA  0.00e+00    0.00722       0.00558  pass
   s   upper  novec     400     32       0        NA  0.00e+00     0.0162        0.0162  pass
   s   upper  novec     500     32       0        NA  0.00e+00     0.0691        0.0104  pass
   s   upper    vec     100     32       0  2.88e-09  3.42e-07    0.00465        0.0600  pass
   s   upper    vec     200     32       0  3.50e-09  5.41e-07    0.00523       0.00967  pass
   s   upper    vec     300     32       0  8.87e-10  4.82e-07    0.00613        0.0181  pass
   s   upper    vec     400     32       0  6.99e-10  6.10e-07     0.0102        0.0327  pass
   s   upper    vec     500     32       0  1.40e-09  5.20e-07     0.0192        0.0512  pass

   d   lower  novec     100     32       0        NA  0.00e+00      0.288        0.0813  pass
   d   lower  novec     200     32       0        NA  0.00e+00    0.00558       0.00323  pass
   d   lower  novec     300     32       0        NA  0.00e+00    0.00779       0.00613  pass
   d   lower  novec     400     32       0        NA  0.00e+00     0.0150        0.0102  pass
   d   lower  novec     500     32       0        NA  0.00e+00     0.0548        0.0180  pass
   d   lower    vec     100     32       0  7.38e-18  5.72e-16      0.204         0.172  pass
   d   lower    vec     200     32       0  2.36e-18  1.14e-15    0.00926        0.0225  pass
   d   lower    vec     300     32       0  3.36e-18  2.95e-15     0.0102        0.0281  pass
   d   lower    vec     400     32       0  1.26e-18  9.20e-16     0.0220        0.0433  pass
   d   lower    vec     500     32       0  1.48e-18  1.17e-15     0.0317        0.0566  pass
   d   upper  novec     100     32       0        NA  0.00e+00    0.00151      0.000966  pass
   d   upper  novec     200     32       0        NA  0.00e+00    0.00332       0.00277  pass
   d   upper  novec     300     32       0        NA  0.00e+00     0.0119        0.0109  pass
   d   upper  novec     400     32       0        NA  0.00e+00    0.00899        0.0175  pass
   d   upper  novec     500     32       0        NA  0.00e+00     0.0163        0.0138  pass
   d   upper    vec     100     32       0  7.28e-18  6.82e-16    0.00216        0.0314  pass
   d   upper    vec     200     32       0  4.84e-18  9.96e-16    0.00549       0.00961  pass
   d   upper    vec     300     32       0  1.89e-18  9.14e-16     0.0124        0.0187  pass
   d   upper    vec     400     32       0  1.56e-18  2.34e-15     0.0173        0.0342  pass
   d   upper    vec     500     32       0  1.01e-18  1.02e-15     0.0242        0.0494  pass

   c   lower  novec     100     32       0        NA  0.00e+00      0.142        0.0369  pass
   c   lower  novec     200     32       0        NA  0.00e+00    0.00794       0.00317  pass
   c   lower  novec     300     32       0        NA  0.00e+00    0.00648       0.00621  pass
   c   lower  novec     400     32       0        NA  0.00e+00     0.0761        0.0190  pass
   c   lower  novec     500     32       0        NA  0.00e+00     0.0340        0.0333  pass
   c   lower    vec     100     32       0  2.37e-09  3.03e-07      0.208        0.0141  pass
   c   lower    vec     200     32       0  2.84e-09  3.63e-07     0.0136        0.0200  pass
   c   lower    vec     300     32       0  8.78e-10  5.51e-07     0.0245        0.0359  pass
   c   lower    vec     400     32       0  2.38e-09  6.68e-07     0.0385        0.0562  pass
   c   lower    vec     500     32       0  1.30e-09  1.40e-06     0.0249        0.0831  pass
   c   upper  novec     100     32       0        NA  0.00e+00    0.00182       0.00113  pass
   c   upper  novec     200     32       0        NA  0.00e+00    0.00348       0.00281  pass
   c   upper  novec     300     32       0        NA  0.00e+00    0.00703       0.00614  pass
   c   upper  novec     400     32       0        NA  0.00e+00     0.0202        0.0182  pass
   c   upper  novec     500     32       0        NA  0.00e+00     0.0552        0.0302  pass
   c   upper    vec     100     32       0  5.38e-09  6.36e-07     0.0214       0.00371  pass
   c   upper    vec     200     32       0  1.72e-09  1.21e-06    0.00477        0.0108  pass
   c   upper    vec     300     32       0  1.26e-09  4.94e-07    0.00985        0.0232  pass
   c   upper    vec     400     32       0  5.43e-10  5.24e-07     0.0244        0.0442  pass
   c   upper    vec     500     32       0  5.74e-10  5.44e-07     0.0486        0.0515  pass

   z   lower  novec     100     32       0        NA  0.00e+00     0.0561       0.00741  pass
   z   lower  novec     200     32       0        NA  0.00e+00     0.0385       0.00951  pass
   z   lower  novec     300     32       0        NA  0.00e+00     0.0226        0.0223  pass
   z   lower  novec     400     32       0        NA  0.00e+00     0.0401        0.0367  pass
   z   lower  novec     500     32       0        NA  0.00e+00     0.0584        0.0550  pass
   z   lower    vec     100     32       0  8.49e-18  7.05e-16     0.0310        0.0263  pass
   z   lower    vec     200     32       0  2.45e-18  7.16e-16     0.0195        0.0181  pass
   z   lower    vec     300     32       0  1.82e-18  9.30e-16     0.0296        0.0397  pass
   z   lower    vec     400     32       0  2.06e-18  1.02e-15     0.0486        0.0640  pass
   z   lower    vec     500     32       0  1.11e-18  1.26e-15     0.0613        0.0834  pass
   z   upper  novec     100     32       0        NA  0.00e+00    0.00191       0.00130  pass
   z   upper  novec     200     32       0        NA  0.00e+00    0.00597       0.00463  pass
   z   upper  novec     300     32       0        NA  0.00e+00     0.0107        0.0112  pass
   z   upper  novec     400     32       0        NA  0.00e+00     0.0200        0.0176  pass
   z   upper  novec     500     32       0        NA  0.00e+00     0.0286        0.0258  pass
   z   upper    vec     100     32       0  6.07e-18  7.47e-16    0.00551       0.00865  pass
   z   upper    vec     200     32       0  4.04e-18  1.10e-15     0.0130        0.0193  pass
   z   upper    vec     300     32       0  2.55e-18  2.29e-15     0.0293        0.0410  pass
   z   upper    vec     400     32       0  1.29e-18  9.66e-16     0.0522        0.0395  pass
   z   upper    vec     500     32       0  1.22e-18  1.14e-15     0.0391        0.0656  pass
All tests passed for dev-heevd.