ifilot / hfcxx

Hartree-Fock C++ code
GNU General Public License v3.0
28 stars 16 forks source link

Processor cycles are wasted in calling Fgamma with identical parameters #8

Closed Floweynt closed 1 month ago

Floweynt commented 12 months ago

In coulomb_repulsion and nuclear, there are calls like Fgamma(i + j + k, ...) within loops.

This is bad for performance. You call Fgamma Imax * Jmax * Kmax times, but only Imax + Jmax + Kmax invocations will have distinct parameters.

This can be fixed by memorizing the results & precomputing:

#define FGAMMA_CACHE_FOR(_size, expr)                                                                                                                \
    std::vector<double> fgamma_lookup_table((_size) + 1);                                                                                            \
    for (size_t i = 0; i < fgamma_lookup_table.size(); i++)                                                                                          \
    {                                                                                                                                                \
        fgamma_lookup_table[i] = Fgamma(i, expr);                                                                                                    \
    }

...
        FGAMMA_CACHE_FOR(l1 + l2 + m1 + m2 + n1 + n2, rcp2 * gamma);

        for (int i = 0; i <= l1 + l2; i++)
        {
            for (int j = 0; j <= m1 + m2; j++)
            {
                for (int k = 0; k <= n1 + n2; k++)
                {
                    sum += ax[i] * ay[j] * az[k] * fgamma_lookup_table[i + j + k];
                }
            }
        }
...
        FGAMMA_CACHE_FOR(la + lb + lc + ld + ma + mb + mc + md + na + nb + nc + nd, 0.25 * rpq2 / delta);

        for (int i = 0; i <= (la + lb + lc + ld); i++)
        {
            for (int j = 0; j <= (ma + mb + mc + md); j++)
            {
                for (int k = 0; k <= (na + nb + nc + nd); k++)
                {
                    sum += bx[i] * by[j] * bz[k] * fgamma_lookup_table[i + j + k];
                }
            }
        }
ifilot commented 1 month ago

Thank you for this issue. Appreciated!

The suggested implementation has been integrated but has a negligible impact on the actual performance as the evaluation of these functions is not a time-critical routine. Closing this issue as resolved.