Dice required tremendous amount of RAM

vvp-nsk commented 8 months ago

Hej!

Please correct me if I am wrong here, but the SHCI method is not shared memory parallelized within Dice. I am struggling with getting it works for a moderate-size problem (74e, 179o). It requires so much memory per process that even 64GB of RAM is not sufficient to get it working with epsilon_1=1.0e-4. My idea now is to reduce the number of virtual orbitals. Could you please explain what is the purpose of the 'nvirt' keyword implemented in Dice?

Thank you in advance!

With best regards, Victor

P.s. Both 'diskdavidson' and 'directdavidson' algorithms neither help.

sanshar commented 8 months ago

I would not call (74e,179o) as a moderate-size. Remember that HCI is attempting to an approximate FCI, so the Hilbert space of this problem is very large. With 179 orbital the cost of storing integrals themselves which roughly scale as n^4 will be tremendous. HCI is certainly optimized to deal with such large active spaces. We almost always assume that one can store integrals in memory etc. I am not 100% sure what nvirt is at the moment. I will have to have a look in the code. Does anyone else remember off the top of their head?

Sandeep.

On Fri, Mar 15, 2024 at 5:36 AM vvp-nsk @.***> wrote:

Hej!

Please correct me if I am wrong here, but the SHCI method is not shared memory parallelized within Dice. I am struggling with getting it works for a moderate-size problem (74e, 179o). It requires so much memory per process that even 64GB of RAM is not sufficient to get it working with epsilon_1=1.0e-4. My idea now is to reduce the number of virtual orbitals. Could you please explain what is the purpose of the 'nvirt' keyword implemented in Dice?

Thank you in advance!

With best regards, Victor

P.s. Both 'diskdavidson' and 'directdavidson' algorithms neither help.

— Reply to this email directly, view it on GitHub https://github.com/sanshar/Dice/issues/21, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABVW4GBXQVWNCOWS57EUBTYYLMNJAVCNFSM6AAAAABEX2XA6SVHI2DSMVQWIX3LMV43ASLTON2WKOZSGE4DQMZRGU3TSOI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

vvp-nsk commented 8 months ago

Hej!

I agree that HCI does approximate FCI at the cost of an exponential scaling. In other words, smaller epsilon's thresholds cause an extreme size of CI space. It is still ok.

With 179 orbital the cost of storing integrals themselves which roughly scale as n^4 will be tremendous.

Actually, it is something about 8GB, not much.

I am puzzled about how to force DICE to treat only a certain subset of occupied and virtual orbitals as active, i.e., how to instruct Dice that:

first N doubly occupied orbitals are frozen
last K virtual orbitals are inactive

In this way, one can drastically reduce the dimensionality of Hilbert space but still get some realistic CI expansion. For instance, performing HCI with very tight thresholds in valence-only active space still might provide a reasonable MSD trial for subsequent AFQMC, IMHO.

Another question - if one has a CI coefficients and corresponding determinant list (produced by 'fci.addons.large_ci'), how to dump such information into the 'dets.bin' file? Probably, you can point out to a dedicated function already implemented and available in QMCUtils.py?

Thank you in advance!

With best regards, Victor

xubwa commented 8 months ago

Keyword nvirt is no longer used in hci code. 64G of ram is not large, you can try with some larger epsilon which will require less resources or consider using more resources, giving more rams or distributing among several nodes. If you want to freeze several core and virtual orbitals, you may just exclude them in the CI calculation.

vvp-nsk commented 8 months ago

Hej!

If you want to freeze several core and virtual orbitals, you may just exclude them in the CI calculation.

Could you please provide a working example on how to do so?

With best regards, Victor

xubwa commented 8 months ago

Just shrink your (74e,179o) calculation to whatever space you actually want to correlate. If you want to have 20 doubly occupied orbitals and 50 empty orbitals, then do a calculation with (34e,109o).

vvp-nsk commented 8 months ago

Hej!

To my best knowledge, Dice is not capable to perform such shrinking. Please correct me if I am wrong. Do you mean a shrinking of orbitals at the stage of generating of FCIDUMP?

Also, could you please shed light on the 'restricted' keyword? Sorry to ask many question but documentation is a bit scarce.

With best regards, Victor

xubwa commented 8 months ago

Yes, use a smaller active space would be proper for the purpose you described. restrict keyword is only experimental, correct me on this if I'm incorrect @sanshar , it is used to mimic a restricted active space or generalized active space.

ankit76 commented 8 months ago

If you are looking for a script that does core averaging, you can make a dummy casscf or hciscf object e.g. see here https://github.com/sanshar/Dice/blob/master/examples/DQMC/hci_trials/n2.py#L27

For the dets file, I don't have a python function that writes dets from pyscf, but it should be easy to write based on this function https://github.com/sanshar/Dice/blob/master/scripts/QMCUtils.py#L271

vvp-nsk commented 8 months ago

Hej!

For the dets file, I don't have a python function that writes dets from pyscf, but it should be easy to write based on this function https://github.com/sanshar/Dice/blob/master/scripts/QMCUtils.py#L271

If I grab CI vector from PySCF (or, Arrow), do I need to worry about phase conversion as it is implemented in IPIE?

Thank you in advance!

With best regards, Victor

sanshar commented 8 months ago

Inside Dice everything should be consistent. If you go from dice-HCI to IPIE then i am not sure, or if you go from Arrow to Dice-AFQMC again I am not sure.

Sandeep.

On Wed, Mar 20, 2024 at 6:28 AM vvp-nsk @.***> wrote:

Hej!

For the dets file, I don't have a python function that writes dets from pyscf, but it should be easy to write based on this function https://github.com/sanshar/Dice/blob/master/scripts/QMCUtils.py#L271

If I grab CI vector from PySCF (or, Arrow), do I need to worry about phase conversion https://github.com/JoonhoLee-Group/ipie/blob/deb76d91a65eb42787ea35388aa3220d59490408/tools/convert_phase.py#L32 as it is implemented in IPIE?

Thank you in advance!

With best regards, Victor

— Reply to this email directly, view it on GitHub https://github.com/sanshar/Dice/issues/21#issuecomment-2009453408, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABVW4BTDPDYTX4GAEUZHSDYZF6H3AVCNFSM6AAAAABEX2XA6SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBZGQ2TGNBQHA . You are receiving this because you were mentioned.Message ID: @.***>

ndattani commented 8 months ago

I agree with @sanshar that (74e,179o) is not a "moderate-sized problem" and I will even say that it's near (or beyond) the limit of what can be done in 2024 with state-of-the-art supercomputers, depending on the accuracy that you want (e.g. ±20 mH, or ±10 mH, or ±1 mH, or ±0.1 mH).

I also agree with @xubwa that 64 GB is not "so much RAM".

We did (113e,76o) for FeMoco in this paper but did not get any meaningful energies (the small calculation was just done to show the occupation numbers, and how nasty the problem is). Even for (54e,54o) in the same paper, the difference between E_var and E_tot was more than 10 mH for the smallest value of epsislon (which I believe required nodes with 1 TB of RAM). The final extrapolated SHCI energy was more than 2 mH different from the extrapolated DMRG energy, and based on how the extrapolated SHCI energy was changing as more and more values of epsilon were used and plotted, I would not "bet" that the final SHCI energy was within ±1 mH of the "true" ground-state eigenvalue for this Hamiltonian.

Therefore, for a system with your size, a reasonable target precision might be ±20 mH or ±10 mH which is much larger than what is usually desired in quantum chemistry, so (74e,179o) is not a "moderate-sized" problem for SCHI, but rather quite a large one!

sanshar / Dice

Dice required tremendous amount of RAM #21