theochem / AtomDB

An Extended Periodic Table of Neutral and Charged Atomic Species
http://atomdb.qcdevs.org/
GNU General Public License v3.0
16 stars 14 forks source link

[Promolecule] Utility functions for creating promolecules #19

Closed gabrielasd closed 7 months ago

gabrielasd commented 9 months ago

This was originally issue 36 in the QuantumElephant repo. I'm replicating here to keep track of things, but there is already part of this initial request that has been implemented by @msricher in the module promolecule

Description:

We need utility functions for creating promolecules simply. These will involve creating the proper linear combinations of atomic species and then loading them into the Promolecule.

Two simple methods for coefficients.

  1. Given atoms with positions/centers and charges which are not integers.
    • duplicate atoms on each center.
    • for atom with ceiling(charge), choose coeff = charge - floor(charge)
    • for atom with floor(charge) electrons choose coeff = ceiling(charge) - charge
    • if atom with floor(charge) does not exist, choose coeff = (atomic_number - charge)/(atomic_number - ceiling(charge))

Example. O atom with charge -0.4 charge 0.0 has coefficient .6 charge -1 has coefficient .4

Example: O atom with charge -1.2 charge -1 should have weight 9.2/9.0

  1. Given atoms with positions/centers and charges AND multiplicities which are not integers
    • triplicate atoms on each center.
    • coefficients and species are chosen by the flat planes.

Originally posted by @PaulWAyers in https://github.com/QuantumElephant/atomdb/issues/2#issuecomment-1119991254

We should make a new class based on the Species class called MixedSpecies, which acts as a linear combination of different charge/multiplicity states of the same element at the same coordinates in space, to ensure that properties like energy and mass remain correct when accessed through the promolecule species.

Then, we should have different utility functions for creating the promolecules:

  1. Basic; just specify atoms and coordinates
  2. Basic manual; specify atoms, coordinates, and {charge, multiplicity}
  3. As in Paul's original comment, copied above
  4. As in Paul's other comment, which I'll copy into a comment below
gabrielasd commented 9 months ago

@msricher let me know if this issue can be closed and maybe open an updated version with what is left to do, which can be labelled as furture enhancement.

The issue I am aware of in the current implementation of the tool is that the hessian function has a bug. The blame for this is on me and I can look at it after the issue with the bug in the 1st and 2nd order derivatives of the splines is resolved, but I may need help eve then.

msricher commented 8 months ago

I'm going to start working on this now. I'll let you know in the next few days how long I think it'll take and what the remaining issues are.

msricher commented 8 months ago

Remaining issues

Arbitrary promolecules

I'm working on generating promolecules with arbitrary $N_\text{elec}$ (henceforth $N$) and $N_S$. This is my procedure so far:

  1. A) If both $N$ and $N_S$ are integers, attempt to load the corresponding species $(N,N_S)$ from the database, and append it to the promolecule with coefficient $1$. B) If the species does not exist, go to step 3 and emit a warning that the species must be generated as a linear combination.
  2. A) If $N$ is not an integer and $N_S$ is an integer, attempt to load the species $(\lfloor N \rfloor, N_S \mp \left(\lfloor N \rfloor \text{ mod } 2\right))$ and the species $(\lceil N \rceil, N_S \pm \left(\lceil N \rceil \text{ mod } 2\right))$. The $\{\pm,\mp\}$ signs are chosen by convention. B) If both species exist, go to step 2C. If the $\lfloor N \rfloor$ species does not exist but the $\lceil N \rceil$ species does exist, go to step 2D. Otherwise go to step 3. C) Append the $\lfloor N \rfloor$ species to the promolecule with coefficient $\lceil \text{charge} \rceil - \text{charge}$ and the $\lceil N \rceil$ species to the promolecule with coefficient $\text{charge} - \lfloor \text{charge} \rfloor$. D) Append the $\lceil N \rceil$ species to the promolecule with coefficient $(Z - \text{charge}) / (Z - \lceil \text{charge} \rceil$). Emit a warning about coefficient $c>1$ which may affect accuracy of intensive properties.
  3. A) Load all species which exist. Filter them to only the ones differing in $[N,N_S]$ from the target $[N_0,{N_S}_0]$ by less than or equal to a distance of $\sqrt{2}$ (i.e. the delta $(\pm1,\pm1)$, delta $(\pm2,0)$, and delta $(0,\pm2)$ species). B) For all combinations of 3 species (flat planes), compute the barycentric coordinates of the target species in that $[N,S]$ space. If there are combinations with all barycentric coordinates $0\leq\lambda_i\leq1$, continue with just these ones. Otherwise continue with all of them. C) Give each combination a "badness" score. I suggest $(1+\left|\sum_i{\lambda_i}\right|)(1+\left|\prod_i{\lambda_i}\right|)$ to make the coefficients as small as possible and to keep the target species "centered" in the triangle. D) Choose the lowest-scoring combination. Append the three species to the promolecule with coefficients $\lambda_i$. E) If any of these coefficients are $<0$ or $>1$, emit a warning as in step 2D.

@PaulWAyers, @gabrielasd, others, any feedback on this method?

PaulWAyers commented 8 months ago

@msricher what about using the algorithm from Appendix A of the paper? That would allow one to read in energies from a reference, determine the appropriate states/barycentric coordinates, and then linearly combine other properties. For cases where only N (not N_S) is specified, one could use just linear interpolation, but it is equivalent to this procedure when you only constrain the number of electrons, not the number of electrons and the spin (or the number of electrons of each spin), to be correct.

msricher commented 8 months ago

Oh, this could also be a good plan. I'll look into it. Thanks!!

msricher commented 8 months ago

https://github.com/msricher/AtomDB/blob/195824dd812ea6c95630edda7839b8bd82a1f5ee/atomdb/promolecule.py#L454

I've implemented the linear programming strategy for making Promolecules.

My only problem right now is that, for the available databases, there aren't very many states for each element, and so it's hard to think up combinations of charge/spin that will actually work with the limited species we have.

@gabrielasd do you have the HCI data files?

PaulWAyers commented 8 months ago

I think that @FarnazH might have cheap calculations for Carbon (though it was probably lost when wobbie died :.-( ).

It might be worth regenerating data for Carbon (which has interesting spins) at the RHF or CISD level just to have one good example.

msricher commented 8 months ago

Yeah, ok. I'll make a new dataset using PySCF RHF + small basis set just to test things out here.

On Tue., Feb. 27, 2024, 11:38 Paul W. Ayers @.***> wrote:

I think that @FarnazH https://github.com/FarnazH might have cheap calculations for Carbon (though it was probably lost when wobbie died :.-( ).

It might be worth regenerating data for Carbon (which has interesting spins) at the RHF or CISD level just to have one good example.

— Reply to this email directly, view it on GitHub https://github.com/theochem/AtomDB/issues/19#issuecomment-1967031834, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADLODOWQ7GKY5SZYUZ4VTNDYVYDZJAVCNFSM6AAAAABCW7FMQGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRXGAZTCOBTGQ . You are receiving this because you were mentioned.Message ID: @.***>

msricher commented 8 months ago

I got floating-point charge/multiplicity promolecules working. I did this using the Appendix A linear programming. I want to test this more thoroughly, but none of our datasets are complete enough to really do this. The dataset scripts are also kind of messy now.

My current branch promol_linprog on msricher has a cc-pVDZ UHF dataset that works well enough to give energies and test my promolecules, but otherwise it's also really broken.

@PaulWAyers @gabrielasd @maximilianvz maybe we should work on cleaning up the existing datasets and adding a simple PySCF UHF one? Then I can submit a PR for the promolecule generator.

I'm going to look at some of the spline errors now.