theochem / gbasis

Python library for analytical evaluation and integration of Gaussian-type basis functions and related quantities.
http://gbasis.qcdevs.org/
GNU Lesser General Public License v3.0
40 stars 22 forks source link

Fix memory usage issue in `evaluate_density` (#121) #191

Open msricher opened 5 months ago

msricher commented 5 months ago

Fix memory usage issue in evaluate_density (#121) by using an optimized np.einsum.

Checklist

Type of Changes

Type
:bug: Bug fix
:sparkles: New feature
:hammer: Refactoring
:scroll: Docs

Related

msricher commented 5 months ago

Was it this simple? If so, we should look for other places in the code where the same trick can be used.

How did the performance change? I don't see any reason why this would be slower....

I think so! I didn't notice a performance hit. I wouldn't be surprised if it's faster.

msricher commented 5 months ago

See https://github.com/theochem/gbasis/issues/121#issuecomment-2192621323. This doesn't make things better, as I found out when I did a real-time profile.

PaulWAyers commented 5 months ago

It seemed to good to be true. The memory is that we evaluate all the orbitals on a grid first, and then form the density. This is not nearly as good an idea as evaluating directly from the density matrix. IF the density matrix is defined in terms of basis functions or localized orbitals, we can screen the evaluation.

Realistically, this is to change everything. You will not evaluate the basis at first; instead you will, for each grid point, determine whether a given product of basis functions is significant. Once that is known, you will evaluate only the significant products of basis functions. The issue is going to be how to make this efficient given that a "for loop" over grid points is going to be terribly inefficient in Python. I can write some pseudocode for it, but the core issue is going to be making the operation efficient. The good news is that with this done, the cost will be proportional to the number of grid points (independent of the basis) and the memory requirement will be similar to the number of grid points also.

It is not likely to be able to do this efficiently in the MO basis, so one can really only do this when transform=None. (It would work OK for localized MOs, but the logic would be different and more complicated.

PaulWAyers commented 5 months ago

With transform=None, you can screen. The key idea is to keep track of which atomic-orbitals are needed for a given grid point.

For a grid point that is $d$ units way from the center of the basis function, the value of the (contracted) basis function is greater than

$$ c{\min} n{\min} d^\ell e^{-\alpha_{min} d^2} $$

$$ n{\min} = \sqrt{\frac{\sqrt{2 \alpha{\min}}(4\alpha_{\min})^\ell}{\sqrt{\pi}(2 \ell + 1)!!}} $$

where $c{\min}$, $n{\min}$, and $\alpha_{\min}$ denote the contraction coefficient, normalization constant, and exponent of the most diffuse Gaussian in a contracted basis function. If the cutoff tolerance is $\epsilon$, you do not need to include this basis function for grid points that are further away than

$$ \epsilon = c{\min} n{\min} d^\ell e^{-\alpha_{min} d^2} $$

This equation can be solved exactly for $\ell=0$ (in terms of logs) and $\ell=2$ (in terms of the Lambert W function). Otherwise one needs to solve it numerically. For $\ell = 0$,

$$ d = \left(-\frac{\ln \frac{\epsilon}{ c{\min} n{\min}}}{\alpha_{\min}}\right)^{\frac{1}{2}} $$

For $\ell = 2$ one has the equation,

$$ \frac{- \alpha{\min} d}{c{\min} n{\min}} = -\alpha{\min}d^2 e^{-\alpha_{min} d^2} $$

There are two roots, one near the nucleus (where d^2 is small) and one far out. The second root is the one we want, and has the value,

$$ d = \sqrt{\frac{W{-1}\left(\frac{- \alpha{\min} d}{c{\min} n{\min}}\right)}{- \alpha_{\min}}} $$

(the same formula, but with the Lambert W function replacing the logarithm.

We could solve for other values of $\ell$ using a rootfinder, but I don't know that it is worth it. I'd use the logarithm for $\ell = 2$ and the Lambert W otherwise; as long as we aren't too agressive with $\epsilon$ (and we shouldn't be) it will be fine.

The procedure then is to:

  1. Group grid points according to the memory available.
  2. For each group of grid points, compute a mask array where each shell is either True (included) or False (not included) based on whether it is further than $d$. This mask array has size $n{grid} \times n{shells}$.
  3. Evaluate the density but include only contributions from the included shells. This should be possible with a masked-linear-algebra operation.
  4. Continue to other groups of grid points.

In the case where no screening is done, this amounts to a "chunked" version of the old algorithm I think.