psi4 / psi4numpy

Combining Psi4 and Numpy for education and development.
BSD 3-Clause "New" or "Revised" License
329 stars 151 forks source link

not optimal use of einsum #1

Closed Konjkov closed 9 years ago

Konjkov commented 9 years ago

Good day Daniel, I just found in your project amazing python realization of CCSD. I immediately copied it in my project https://github.com/Konjkov/pyquante2/blob/master/pyquante2/cc/ccsd.py and carefully profiling with https://github.com/rkern/line_profiler.

I found that most of the time is spent executing the following line of code: https://github.com/dgasmith/psi4numpy/blob/master/Coupled-Cluster/CCSD.dat#L201 which is not surprising.

If einsum indices over which the summation is performed standing in a beginig of the list: np.einsum('mnab,mnef->abef', tmp_tau, MO[o, o, v, v]), it is becomes slower than when in the end: np.einsum('abij,cdij->abcd', np.transpose(tmp_tau, axes=(2,3,0,1)), MO[v, v, o, o]).

Testing of both showed, that correct sequence of indices leads to a twofold speed up in the tests. https://github.com/Konjkov/pyquante2/blob/master/tests/test_ccsd.py

Another possible cause of slow computing - the multiplication of three tensors simultaneously. https://github.com/dgasmith/psi4numpy/blob/master/Coupled-Cluster/CCSD.dat#L300 sequential multiplication is always performed faster.

Despite these shortcomings , I am very glad that there are people like you who use Python in quantum chemistry. I'm waiting for the appearance of implementation of CASSCF and especially state-specific multireference coupled-cluster approach of Mukherjee and co-workers (Mk-MRCC) in python.

Good luck in python programming. Vladimir.

Konjkov commented 9 years ago

Update. I just remove transpose (same speed): Wabef += 0.25 * np.einsum('mnab,efmn->abef', tmp_tau, MO[v, v, o, o])

dgasmith commented 9 years ago

Hello, Thank you for your wonderful comments!

There is an interesting break point between clarity and speed and I am not quite sure I have a clear idea of what that is in psi4numpy. Transposing the MO tensor so that the outermost for loop iterates over virtual indices definitely qualifies, I will change this. However, the multiple tensor issue is interesting as einsum was never optimized for more than two. At the same time writing the code this way makes the code much simpler and occasionally avoids very large intermediates. I think its best to leave as is for now. As the numpy version will probably never meet the speed criterion (compared to C or Fortran) I am trying to shoot for "good enough" in the speed department and focus on clarity.

I have thought about the implementation of FCI (and therefore determinate based CASSCF) in psi4numpy for awhile now and have not come up with a good way to do it. In the psi4 development branch I have added CASSCF and am now working on combining the DETCI and DETCAS codes so that we can do many interesting things and ultimately export them back out to the python layer. In addition, I am adding a new helper module that, among other things, will greatly facilitate python/numpy development by exporting such things as three-index density fitting tensors, various Kohn-Sham matrices, and transformed integrals. So there is a lot going on behind the scenes, but not much here at the moment.

If you are interested in contributing to this project or collaborating on other python quantum chemistry projects let me know.