Only use C++ for interfacing with Libint

matt-chan commented 5 years ago

The C++ part is a bit beefier than it needs to be right now, especially if we transition to using PySCF for vanilla integrals and only want to keep GBasis for the exotic integrals. It's a little difficult to make modifications to the code right now because of it.

There are two parts I think we can simplify: the one-electron integrals, and the two-electron integrals.

For the 1e ints, we can probably replace the C++ implementation entirely with Python (and little snippets of Cython). I think this shouldn't be too much slower.

For 2e ints, one way we might be able to decrease the C++ load is to only use C++ to ask libint for integrals of specific shells, 4 shells (ie <ss|ss>) at a time. It would effectively be replacing the gbw code with Python and using it for generating the integrals, and also perform the cart_to_pure in Python. It won't save us from having to write code to interface with libint when we want new kernels, but at least it we won't have to dig through that much C++ to do it. Also we could (finally) switch over to the standard sub-shell ordering everyone else is using with less C++ hacking.

An added benefit of writing code this way is that we can ask for integrals 1 set of shells at a time. This is useful for direct-SCF/CI algorithms and makes our memory issues largely moot (there still is probably some use for Cholesky/DF etc).

One (fairly obvious?) thing to note about asking libint for specific sets of shells: We should probably initialize the libint C++ object only once when the python class is instantiated. The initialization costs of C++ would be too heavy otherwise.

Another possibility to simplify the 2e integrals might be to look into the new libint C++11 interface. The documentation says you should be able to implement new operators/kernels, but I haven't seen any code yet...

tovrstra commented 5 years ago

@matt-chan Can you clarify the following please:

Do you also intend to implement contractions at the Python level? Any plans for generalized contractions and related optimization techniques, like PRISM? These would require contractions to be done at the C++ level.
How stable is the integral code in PySCF? Stable API? Released as separate tarball? What about validation? Is it formally tested?

I'm generally in favor of moving part of the C++ code to Python, if it does not completely kill performance. We need to have an idea of the Python overhead:

Loops over shells (and contractions?) would move to Python code. For large basis sets, this could be too much looping in Python to be acceptably efficient.
In some cases, GBasis also needs loops over grid points, e.g. to construct a Fock matrix from a potential specified on an integration grid. I guess the loop over grid points needs to stay at the C++ level?

If we would encounter serious degradations in performance, there may still be a need to move some loops back to C++ code. In any case, the C++ code is too beefy, but this can also be solved by simplifying it. it is definitely not my best C++.

P.S. There is no such thing as standard integral ordering. PySCF and PSI4 are consistent with wikipedia but when you start looking at other codes, there is no general agreement on ordering. Not only order varies, but also sign conventions can differ. dzz in one code can be -dzz in another code, etc. That said, I'm fine with switching to wikipedia conventions. It is the defacto standard for millenials and we need to look ahead. Another option is to edit wikipedia, of course. :p

tovrstra commented 5 years ago

@PaulWAyers @FarnazH and I have been discussing this point. Our conclusion was that we should split the project into one pure gbasis and one C++ with python wrapper cgbasis:

The purpose of gbasis is to have pure python implementations of some exotic integrals, not available in existing integral libraries.
The purpose of the cgbasis should initially just be "evaluating density and related properties on many grid points". Wrappers to other integral libraries can be added later.

PaulWAyers commented 5 years ago

To elaborate, on what @tovrstra said, the idea is that cgbasis is "performant" and gbasis is "flexible". So some sacrifice of readability is acceptable (and inevitable, though of course should be avoided as much as possible) for cgbasis but readability/flexibility is imperative in gbasis and some performance compromises are inevitable (but should be avoided as much as possible, of course).

theochem / cgbasis

Only use C++ for interfacing with Libint #33