psi4 / psi4

Open-Source Quantum Chemistry – an electronic structure package in C++ driven by Python
http://psicode.org
GNU Lesser General Public License v3.0
979 stars 448 forks source link

psi4 crashes when building SAD Guess independently #2897

Open linminhtoo opened 1 year ago

linminhtoo commented 1 year ago

hello, I wish to build the SAD Guess independently in psi4 without running any SCF calculation. is this possible?

I came across this page in the docs: https://psicode.org/psi4manual/master/api/psi4.core.SADGuess.html?highlight=print#psi4.core.SADGuess

but I can't seem to run the function compute_guess() - it causes my kernel/terminal to crash.

here's a code snippet:

import psi4
from psi4.core import SADGuess

psi4_geo = psi4.geometry(
"""
N 4.772900 -0.101700 0.597500
C 3.609000 0.210800 -0.023400
N 2.441000 -0.094000 0.403800
C 1.450300 0.286400 -0.465100
C 0.060400 0.012200 -0.121600
C -0.194400 -0.845100 0.972200
C -1.466300 -1.163000 1.375400
C -2.519900 -0.599700 0.659100
N -3.868100 -0.691400 0.784500
N -4.506700 0.042900 -0.138700
C -3.594600 0.620800 -0.875800
C -2.289700 0.266400 -0.439100
C -0.987700 0.572700 -0.830400
C 1.902300 0.904400 -1.593400
S 3.625400 1.027300 -1.594500
H 4.694900 -0.362900 1.566900
H 5.594000 0.417600 0.335500
H 0.659700 -1.250100 1.491400
H -1.645700 -1.820500 2.212500
H -4.398300 -1.217800 1.458900
H -3.875900 1.263700 -1.685800
H -0.814500 1.239900 -1.659800
H 1.351700 1.280600 -2.430000
symmetry C1
"""
)
psi4.set_options({"scf__reference": "rhf"})

basis = psi4.core.BasisSet.build(psi4_geo, key="BASIS", target='def2-svp', return_atomlist=False)
basis_atoms = psi4.core.BasisSet.build(psi4_geo, key="BASIS", target='def2-svp', return_atomlist=True)

mol_sadguess = SADGuess.build_SAD(basis, basis_atoms)
SADGuess.set_print(mol_sadguess, 2)
# kernel/terminal crashes here!!! :(
SADGuess.compute_guess(mol_sadguess)

the other alternative is to run a "dummy" SCF calculation, but set maxiter = 0 and fail_on_maxiter = False. but i don't know if this will give me the actual initial guess or it would have undergone some further transformations. and it is also not as fast as i'd like, it takes 7 secs on 8 threads for the above mol with 23 atoms. i'd suspect doing the full SCF calculation has a lot of overhead in setting up different parts of the SCF procedure, which I'd like to avoid. (i have millions of molecules I need to get the SADGuess for)

thank you!

susilehtola commented 1 year ago

Note that the SAD guess in Psi4 is not the best one possible, as it is not symmetry aware. Such a SAD guess is available in PySCF, and it is also directly accessible from Python.

I have planned to rectify the situation in Psi4 in the future, but I have some other projects to finish before that.

linminhtoo commented 1 year ago

i have an update, i found this open PR (which hasn't been merged since 2018) on psi4numpy https://github.com/psi4/psi4numpy/pull/36/files

and adapted the code, and it works, though I couldn't specify the dft_functional = "WB97X-D" parameter, but I believe it doesn't matter for the initial guess.

mol = psi4_geo

# cant specify 'dft_functional': "WB97X-D", not valid
psi4.set_options({'basis': 'def2-svp',
                  'scf__reference': 'rhf',
#                   'scf__dft_functional': "WB97X-D",
                  'e_convergence': 1e-8})

# Integral generation from Psi4's MintsHelper
wfn = psi4.core.Wavefunction.build(mol, psi4.core.get_global_option('BASIS'))
# t = time.time()
mints = psi4.core.MintsHelper(wfn.basisset())
S = np.asarray(mints.ao_overlap())

# Get nbf and ndocc for closed shell molecules
nbf = S.shape[0]
ndocc = wfn.nalpha()

print('\nNumber of occupied orbitals: %d' % ndocc)
print('Number of basis functions: %d' % nbf)

# Set SAD basis sets
nbeta = wfn.nbeta()
psi4.core.prepare_options_for_module("SCF")
sad_basis_list = psi4.core.BasisSet.build(wfn.molecule(), "ORBITAL",
    psi4.core.get_global_option("BASIS"), puream=wfn.basisset().has_puream(),
                                     return_atomlist=True)

sad_fitting_list = psi4.core.BasisSet.build(wfn.molecule(), "DF_BASIS_SAD",
    psi4.core.get_option("SCF", "DF_BASIS_SAD"), puream=wfn.basisset().has_puream(),
                                       return_atomlist=True)

# Use Psi4 SADGuess object to build the SAD Guess
SAD = psi4.core.SADGuess.build_SAD(wfn.basisset(), sad_basis_list)  # , ndocc, nbeta
SAD.set_atomic_fit_bases(sad_fitting_list)
SAD.compute_guess();
D = SAD.Da()
sad_guess_manual = D.to_array()

however, when I compare this sad_guess_manual with the density matrix from the full SCF with maxiter = 0, they are not close :(

# run full SCF but limit maxiter to 0
psi4.set_options(
    {
        "scf__reference": "rhf",
        "scf__maxiter": 0,
        "scf__fail_on_maxiter": False
    }
)
energy_sad, wfn_sad = psi4.energy('scf/def2-svp', dft_functional="WB97X-D", molecule=psi4_geo, return_wfn=True)
density_mat_0iters = wfn_sad.Da().to_array()

np.isclose(sad_guess_manual, density_mat_0iters, atol=1e-5).sum() / (density_mat_0iters.shape[0] ** 2)
>> 0.08549818  # should be close to 1.00 but no :/ 

i think this must mean that even setting maxiter = 0 already evolves the initial guess

linminhtoo commented 1 year ago

Note that the SAD guess in Psi4 is not the best one possible, as it is not symmetry aware. Such a SAD guess is available in PySCF, and it is also directly accessible from Python.

I have planned to rectify the situation in Psi4 in the future, but I have some other projects to finish before that.

thanks for this very useful pointer! I will then give PySCF a try. however, my main concern with PySCF is, after getting the SAD Guess from it, I have to reorder the rows and columns of the density matrix, so that it aligns with the ordering in psi4 ? I believe these 2 programs do not have the same ordering (but I'm not certain). I need to do this as most of my workflow is centred in psi4, and a large amount of calculations of density matrices have already been done with psi4

susilehtola commented 1 year ago

If you call Psi4 with maxiter=0 it builds the Fock matrix from the SAD density and diagonalizes it to give you orbitals and then builds a new density from these orbitals

linminhtoo commented 1 year ago

If you call Psi4 with maxiter=0 it builds the Fock matrix from the SAD density and diagonalizes it to give you orbitals and then builds a new density from these orbitals

ah that makes sense then, this would explain why the density matrices are no longer the same. i did some visualization and figured this was the case as well (it already had non-zero off-diagonal values, which don't exist in the SAD guess), thanks.

would you have any thoughts/concerns on doing the SAD guessing in PySCF and then doing the row/col re-ordering?

for starter's, I know how psi4 orders the different magnetic quantum numbers for each angular momentum, so I need to check PySCF's ordering. if they're different, I'll need to reorder the rows & columns. similarly, i don't know if the two programs have identical ordering of specific basis functions in each basis set (e.g. in def2-svp you have multiple sets of s and p orbitals for an element and there could exist different ways of ordering them)

susilehtola commented 1 year ago

would you have any thoughts/concerns on doing the SAD guessing in PySCF and then doing the row/col re-ordering?

What do you need the guesses for? PySCF can also be used to run similar calculations as Psi4.

I don't know if there are differences between the basis function conventions between Psi4 and PySCF. Unfortunately, quantum chemistry programs are not interoperable.

linminhtoo commented 1 year ago

would you have any thoughts/concerns on doing the SAD guessing in PySCF and then doing the row/col re-ordering?

What do you need the guesses for? PySCF can also be used to run similar calculations as Psi4.

I don't know if there are differences between the basis function conventions between Psi4 and PySCF. Unfortunately, quantum chemistry programs are not interoperable.

I'm trying to build a ML model that can predict the converged density matrix. To verify whether the model is of any value, I wish to plug the predictions into a quantum chemistry program.

The problem is that the dataset I'm using (QMugs) has used psi4 to calculate "groundtruth" energies & density matrices at the DFT level. So, my ML model is learning to output density matrices with the ordering convention used by psi4. If I wish to plug it into a different software, like PySCF, I believe I'll have to do some re-ordering or transformations...

Would simply re-ordering the rows/columns not work? (my understanding was that if the basis set was identical, but just that one program uses say px py pz vs another using pz py px a reordering would suffice, but I'm not exactly a quantum chemistry expert...)

Similarly, I wish to compare the convergence rates of my ML model's predicted density matrices against default initial guesses, and also just look at the matrices themselves to compare how they look like (for my own understanding/analysis)

susilehtola commented 1 year ago

I'm trying to build a ML model that can predict the converged density matrix. To verify whether the model is of any value, I wish to plug the predictions into a quantum chemistry program.

Well, I've worked on initial guesses in J. Chem. Theory Comput. 15, 1593 (2019) and J. Chem. Phys. 152, 144105 (2020); I hope you are aware of these works, the first one being especially topical for what you want to do.

If you need quantum chemistry expertise, feel free to reach out. I honestly don't know if it would just be a question of reordering px, py, and pz, or whether there are also differences in the basis functions' normalization and phase.

TiborGY commented 1 year ago

If you are willing to read a bunch of Fortran code, MOKIT by Jingxiang Zou is the closest thing there is to a wavefunction converter between quantum chemistry programs, and it does support both PySCF and Psi4. So in theory you could glean all of the necessary information about the conventions used by them and write your own translator.

You could try to use MOKIT itself, but unfortunately it is still lacking direct conversion between most programs and has the bizarre limitation of having to run Gaussian first to generate an FCHK file even if you just want to convert between two different programs.