Closed wavefunction91 closed 1 year ago
Did you add src/cuda/cuda_heevd.cc
and test/test_heevd_device.cc
?
Whoops, that would have been helpful. Fixed.
On Fri, Aug 25, 2023 at 6:34 PM Mark Gates @.***> wrote:
Did you add src/cuda/cuda_heevd.cc and test/test_heevd_device.cc?
— Reply to this email directly, view it on GitHub https://github.com/icl-utk-edu/lapackpp/pull/42#issuecomment-1694104485, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPLZPOIVRYZVCOIPZKYHOLXXFHBTANCNFSM6AAAAAA37DTZPA . You are receiving this because you authored the thread.Message ID: @.***>
@wavefunction91 Do you want us to add the ROCm and oneMKL implementations? That should be straightforward for us. In that case: @dsukkari Can you add ROCm? @ayarkhan Can you add SYCL? I'm not sure the best route to collaborate on a branch. Would it be easiest to pull the branch into the lapackpp repo and add commits there, with a new PR?
@mgates3 It would be better if you could add them. This branch should allow for edits by maintainers, i.e. you @dsukkari and @ayarkhan should all be able to push to it.
For oneMKL/SYCL, the CMake build was tested and works. On sunspot, the quick tester.
./test/tester dev-heevd
LAPACK++ version 2023.06.00, id 260e406
¯input: ./test/tester dev-heevd
type uplo jobz n device error error2 time (s) ref time (s) status
test matrix A: rand, cond(S) = NA
d lower novec 100 0 NA 0.00e+00 0.672 0.117 pass
d lower novec 200 0 NA 0.00e+00 0.0100 0.00377 pass
d lower novec 300 0 NA 0.00e+00 0.00889 0.00999 pass
d lower novec 400 0 NA 0.00e+00 0.0222 0.0196 pass
d lower novec 500 0 NA 0.00e+00 0.0376 0.0348 pass
All tests passed for dev-heevd.
More detailed testing
$ ./tester --type s,d,c,z --align 32 --dim 100:500:100 --jobz n,v --uplo l,u dev-heevd
LAPACK++ version 2023.06.00, id 260e406
input: ./tester --type 's,d,c,z' --align 32 --dim '100:500:100' --jobz 'n,v' --uplo 'l,u' dev-heevd
type uplo jobz n align device error error2 time (s) ref time (s) status
test matrix A: rand, cond(S) = NA
s lower novec 100 32 0 NA 0.00e+00 0.487 0.121 pass
s lower novec 200 32 0 NA 0.00e+00 0.0307 0.0170 pass
s lower novec 300 32 0 NA 0.00e+00 0.0371 0.0285 pass
s lower novec 400 32 0 NA 0.00e+00 0.0297 0.0285 pass
s lower novec 500 32 0 NA 0.00e+00 0.0470 0.0295 pass
s lower vec 100 32 0 4.66e-09 7.72e-07 0.244 0.147 pass
s lower vec 200 32 0 4.09e-09 9.99e-07 0.0928 0.0219 pass
s lower vec 300 32 0 9.41e-10 7.01e-07 0.00948 0.0264 pass
s lower vec 400 32 0 1.19e-09 7.10e-07 0.0163 0.0444 pass
s lower vec 500 32 0 6.51e-10 2.39e-06 0.0256 0.115 pass
s upper novec 100 32 0 NA 0.00e+00 0.000984 0.000439 pass
s upper novec 200 32 0 NA 0.00e+00 0.00278 0.00268 pass
s upper novec 300 32 0 NA 0.00e+00 0.00722 0.00558 pass
s upper novec 400 32 0 NA 0.00e+00 0.0162 0.0162 pass
s upper novec 500 32 0 NA 0.00e+00 0.0691 0.0104 pass
s upper vec 100 32 0 2.88e-09 3.42e-07 0.00465 0.0600 pass
s upper vec 200 32 0 3.50e-09 5.41e-07 0.00523 0.00967 pass
s upper vec 300 32 0 8.87e-10 4.82e-07 0.00613 0.0181 pass
s upper vec 400 32 0 6.99e-10 6.10e-07 0.0102 0.0327 pass
s upper vec 500 32 0 1.40e-09 5.20e-07 0.0192 0.0512 pass
d lower novec 100 32 0 NA 0.00e+00 0.288 0.0813 pass
d lower novec 200 32 0 NA 0.00e+00 0.00558 0.00323 pass
d lower novec 300 32 0 NA 0.00e+00 0.00779 0.00613 pass
d lower novec 400 32 0 NA 0.00e+00 0.0150 0.0102 pass
d lower novec 500 32 0 NA 0.00e+00 0.0548 0.0180 pass
d lower vec 100 32 0 7.38e-18 5.72e-16 0.204 0.172 pass
d lower vec 200 32 0 2.36e-18 1.14e-15 0.00926 0.0225 pass
d lower vec 300 32 0 3.36e-18 2.95e-15 0.0102 0.0281 pass
d lower vec 400 32 0 1.26e-18 9.20e-16 0.0220 0.0433 pass
d lower vec 500 32 0 1.48e-18 1.17e-15 0.0317 0.0566 pass
d upper novec 100 32 0 NA 0.00e+00 0.00151 0.000966 pass
d upper novec 200 32 0 NA 0.00e+00 0.00332 0.00277 pass
d upper novec 300 32 0 NA 0.00e+00 0.0119 0.0109 pass
d upper novec 400 32 0 NA 0.00e+00 0.00899 0.0175 pass
d upper novec 500 32 0 NA 0.00e+00 0.0163 0.0138 pass
d upper vec 100 32 0 7.28e-18 6.82e-16 0.00216 0.0314 pass
d upper vec 200 32 0 4.84e-18 9.96e-16 0.00549 0.00961 pass
d upper vec 300 32 0 1.89e-18 9.14e-16 0.0124 0.0187 pass
d upper vec 400 32 0 1.56e-18 2.34e-15 0.0173 0.0342 pass
d upper vec 500 32 0 1.01e-18 1.02e-15 0.0242 0.0494 pass
c lower novec 100 32 0 NA 0.00e+00 0.142 0.0369 pass
c lower novec 200 32 0 NA 0.00e+00 0.00794 0.00317 pass
c lower novec 300 32 0 NA 0.00e+00 0.00648 0.00621 pass
c lower novec 400 32 0 NA 0.00e+00 0.0761 0.0190 pass
c lower novec 500 32 0 NA 0.00e+00 0.0340 0.0333 pass
c lower vec 100 32 0 2.37e-09 3.03e-07 0.208 0.0141 pass
c lower vec 200 32 0 2.84e-09 3.63e-07 0.0136 0.0200 pass
c lower vec 300 32 0 8.78e-10 5.51e-07 0.0245 0.0359 pass
c lower vec 400 32 0 2.38e-09 6.68e-07 0.0385 0.0562 pass
c lower vec 500 32 0 1.30e-09 1.40e-06 0.0249 0.0831 pass
c upper novec 100 32 0 NA 0.00e+00 0.00182 0.00113 pass
c upper novec 200 32 0 NA 0.00e+00 0.00348 0.00281 pass
c upper novec 300 32 0 NA 0.00e+00 0.00703 0.00614 pass
c upper novec 400 32 0 NA 0.00e+00 0.0202 0.0182 pass
c upper novec 500 32 0 NA 0.00e+00 0.0552 0.0302 pass
c upper vec 100 32 0 5.38e-09 6.36e-07 0.0214 0.00371 pass
c upper vec 200 32 0 1.72e-09 1.21e-06 0.00477 0.0108 pass
c upper vec 300 32 0 1.26e-09 4.94e-07 0.00985 0.0232 pass
c upper vec 400 32 0 5.43e-10 5.24e-07 0.0244 0.0442 pass
c upper vec 500 32 0 5.74e-10 5.44e-07 0.0486 0.0515 pass
z lower novec 100 32 0 NA 0.00e+00 0.0561 0.00741 pass
z lower novec 200 32 0 NA 0.00e+00 0.0385 0.00951 pass
z lower novec 300 32 0 NA 0.00e+00 0.0226 0.0223 pass
z lower novec 400 32 0 NA 0.00e+00 0.0401 0.0367 pass
z lower novec 500 32 0 NA 0.00e+00 0.0584 0.0550 pass
z lower vec 100 32 0 8.49e-18 7.05e-16 0.0310 0.0263 pass
z lower vec 200 32 0 2.45e-18 7.16e-16 0.0195 0.0181 pass
z lower vec 300 32 0 1.82e-18 9.30e-16 0.0296 0.0397 pass
z lower vec 400 32 0 2.06e-18 1.02e-15 0.0486 0.0640 pass
z lower vec 500 32 0 1.11e-18 1.26e-15 0.0613 0.0834 pass
z upper novec 100 32 0 NA 0.00e+00 0.00191 0.00130 pass
z upper novec 200 32 0 NA 0.00e+00 0.00597 0.00463 pass
z upper novec 300 32 0 NA 0.00e+00 0.0107 0.0112 pass
z upper novec 400 32 0 NA 0.00e+00 0.0200 0.0176 pass
z upper novec 500 32 0 NA 0.00e+00 0.0286 0.0258 pass
z upper vec 100 32 0 6.07e-18 7.47e-16 0.00551 0.00865 pass
z upper vec 200 32 0 4.04e-18 1.10e-15 0.0130 0.0193 pass
z upper vec 300 32 0 2.55e-18 2.29e-15 0.0293 0.0410 pass
z upper vec 400 32 0 1.29e-18 9.66e-16 0.0522 0.0395 pass
z upper vec 500 32 0 1.22e-18 1.14e-15 0.0391 0.0656 pass
All tests passed for dev-heevd.
This change was required for tighter integration with NWChemEx.
Adds the following:
heevd
cuSolver
implementation ofheevd
rocSolver
implementation ofheevd
heevd
heevd
Opening early to coordinate if necessary. I have access to OLCF to add
rocSolver
, but I'd have to update my credentials with ALCF to get access to Intel/oneMKL HW. Also, I do not have access to a pre-11 CUDA installation, so the path for manual dispatch is untested. Unit tests on NVIDIA work locally and at NERSC