Closed jeffhammond closed 9 months ago
@jeffhammond the code is not ready for the case when USE_F90_ALLOCATABLE
is not set
https://github.com/nwchemgit/nwchem/actions/runs/6431953108/job/17465829018?pr=882
icsd_t2.F:9222:46:
9222 | & 0.5d0*alpha,f_a,h21d,
| 1
Error: Symbol ‘f_a’ at (1) has no IMPLICIT type; did you mean ‘d_a’?
icsd_t2.F:9223:46:
okay, sorry, i thought i tested without but apparently i didn't clean the environment. i'll fix it.
okay, sorry, i thought i tested without but apparently i didn't clean the environment. i'll fix it.
No problem. My only suggestion is to run github actions in your own fork first, so that you avoid burning github actions' cycyles that could be used by other pulls and/or schedule tests.
@jeffhammond Merge is dangerous. This is what has cause the repository corruption in the past. You should have rebased, instead.
Could you try the following?
git reset --mixed 6660bc8048c9fe2e084a2afc50d10d8a46995e05
git fetch upstream
git pull upstream master --rebase
git push -f origin ccsd_t2_dgemm
As an alternative, I could do the git push -f
myself, if this is OK for you.
I though GitHub update would do rebase. I'll fix it.
Actually feel free to fix it. It's late here.
Actually feel free to fix it. It's late here.
Done.
Do you agree this is ready for review? It seems to pass CI on my end.
Do you agree this is ready for review? It seems to pass CI on my end.
I have added ccsd_kernels.F
to USES_BLAS
since it calls dgemm.
What about threading control of the threaded BLAS now used by these kernels? Is the number of threads set anywhere?
https://github.com/nwchemgit/nwchem/blob/7b06d34f91065671a4c12230a09d72c08a480f88/src/nwchem.F#L180
Do you agree this is ready for review? It seems to pass CI on my end.
I have added
ccsd_kernels.F
toUSES_BLAS
since it calls dgemm.What about threading control of the threaded BLAS now used by these kernels? Is the number of threads set anywhere?
No, I haven't made that change yet, because nobody will ever run that code. I wrote it as a prototype at Intel, and with Intel OpenMP and MKL will figure out the threading thing properly.
I will clean it up eventually, probably to remove the code in ccsd_kernels.F altogether.
CCSD_T2_8 should never have used transpose/sort and I should have done this 15 years ago, but here we are.
Karol wrote the transpose-free loop version, but it's equivalent to DGEMM. This change uses DGEMM.
For the ICSD/NTS version, I just pulled in the whole routine from CCSD because it's much cleaner.
In single-node testing, this appears to be 20-30% faster (0.4s versus 0.6s for H2O cc-pVQZ).