psi4 / psi4

Open-Source Quantum Chemistry – an electronic structure package in C++ driven by Python
http://psicode.org
GNU Lesser General Public License v3.0
958 stars 442 forks source link

Move finite difference code Python-side #884

Closed dgasmith closed 6 years ago

dgasmith commented 6 years ago

It would be great to move the finite difference code Python-side for a variety of reasons.

This would require the following steps:

Many folks might want to comment on this so please make a PR after/during step one for review.

JonathonMisiewicz commented 6 years ago

I'm tempted to take this one.

When exporting a class to Python, is Psi4 style to export all public methods or just the ones needed?

The code for frequencies by finite difference of energies has about 100 lines of commented code that @psi-rking marked "a to be finished later problem" six years ago. Do we want to finish that before, during, or after the finite difference code is moved to Python?

psi-rking commented 6 years ago

Moving the finite-difference code into Python seems like a good idea. I'm sure it could be much cleaner.

For geometry optimization purposes, one only ever really needs cartesian Hessian -> internal coordinate Hessian. This capability is in the C++ and also now in the Python optking - including the gradient.derivative B term (with the derivative-B matrix elements computed analytically).

I think that the challenge I ran into (long ago) in the referenced commented-out code in generating a cartesian Hessian using a finite-difference derivative B matrix for this term was in the Sayvetz/Eckart conditions. The rotations/translations are projected out (now by cdsalc) for the original, undisplaced geometry - but then will not be exactly so at the displaced geometries. I don't recall if my motivation for using finite-difference derivative-B elements at the time was for reduced computational expense, because the analytic derivative-B elements were not yet coded, or to try to avoid discontinuity problems in the derivative formulas. The f-d option might be a good capability to have; I'm not sure offhand if it is worth doing. On the coordinate transformation of course Wesley Allen is the guru.

jturney commented 6 years ago

When exporting a class to Python, I say export all public methods. You never know what may be needed in the future.

JonathonMisiewicz commented 6 years ago

@dgasmith The matrix factory input of CdSalcList is used right here. Am I missing something? Whether to keep the matrix factory input doesn't seem to be up for decision.

dgasmith commented 6 years ago

Yes, but that function is only ever called in a single spot AFAIK (deriv.cc). It could be better to shift that function signature to require the matrix factory on call. Most CdSalcList builds give a empty factory pointer to the constructor call which will cause some lovely seg faults anyway.

psi-rking commented 6 years ago

Ignore what I wrote above about the commented out code and derivative B matrices. I remembered what is going on.

The finite difference code uses rather strange coordinates for its displacements (symmetrized cartesians with rotations and translations numerically projected out). There is no ready way to determine the derivative of these coordinates (which would correspond to the derivative B matrix elements), as there are for an ordinary bend, for example, because they lack an analytic form. Thus, I tried using finite differences, but it won't perfectly work. To transform from the cdsalc-coordinate Hessian to a 3Nx3N cartesian coordinate hessian uniquely, the Sayvetz conditions should be explicitly enforced. This is described in WDA's transformation paper, but I've never coded that. Except for certain types of reaction path/dynamics calculations, I don't know why you would need it.

loriab commented 6 years ago

Is something like the attached relevant to what you're talking about, @psi-rking ? form A and form B are two geometries for a system. If useful, I've got Fortran code for finding angles and rotating Hessian accordingly.

screen shot 2017-12-22 at 4 29 41 pm

dgasmith commented 6 years ago

As a note: a great first step is to simply reproduce the C side procedure Python side. After that transformation we can add more complexities to the code.

-Daniel Smith Sent from my iPhone.

On Dec 22, 2017, at 17:31, Lori A. Burns notifications@github.com wrote:

Is something like the attached relevant to what you're talking about, @psi-rking ? form A and form B are two geometries for a system. If useful, I've got Fortran code for finding angles and rotating Hessian accordingly.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

psi-rking commented 6 years ago

I don't know. This looks like a scheme to go from 3Nx3N to the same. To go from (3N-6)x(3N-6) to Cartesian, I think one needs to solve a linear system including the rotational and vibrational degrees of freedom.

-Rollin

On Fri, Dec 22, 2017 at 4:31 PM, Lori A. Burns notifications@github.com wrote:

Is something like the attached relevant to what you're talking about, @psi-rking https://github.com/psi-rking ? form A and form B are two geometries for a system. If useful, I've got Fortran code for finding angles and rotating Hessian accordingly.

[image: screen shot 2017-12-22 at 4 29 41 pm] https://user-images.githubusercontent.com/2314730/34312980-e97ddf76-e735-11e7-8979-11d889cfeab9.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/psi4/psi4/issues/884#issuecomment-353684772, or mute the thread https://github.com/notifications/unsubscribe-auth/ADguTBeEjH_7Sin5fcU-SRE2vOSGnUVaks5tDC3KgaJpZM4REtHr .

loriab commented 6 years ago

@psi-rking, regarding your https://github.com/psi4/psi4/issues/884#issuecomment-353668888, does that mean that if I turn off trans and/or rot projection before generating the SALCs in findif, that the resulting Cartesian Hessian is wrong b/c even though the calc ran extra points to step along the trans/rot SALCs, the recomposition into Cartesian Hess is missing something for trans/rot?

That is, is my vib analysis plan flawed to (1) run a gradient at input geom, (2) if gradient non-zero turn off rot projection for SALC for findif H by G, (3) use Cartesian Hessian for vibration analysis, again not projecting rotations?

psi-rking commented 6 years ago

Lori, This is a problem that I have long wished to solve. Perhaps we can do it together. I've scribbled down some notes on the attached pdf. The primary problem is that the transformation to the cartesian hessian is not a linear one at non-stationary points. However, given the constraint that the energy is independent of rotations and translations, we don't have to do the extra displacements. The bad news is that it is not trivial. We need the 'derivative B matrix' or the second derivative of the external coordinates wrt cartesian displacements. Perhaps these are 0 for the COM coordinates, and your scheme will therefore work out of the box. But they are not for rotations. However, if we can find them or figure them out, then we can construct the Cartesian Hessian from a minimal number of displacements. Solving a 6x6 for each element is a good tradeoff I would say. The only real pain I forsee might be accommodating the linear molecules. In those cases one might be able to revert to not projecting the rotations at all as you planned. My notation is an inconsistent compromise between what I'm used to and the notation in WDA, JCP, 98, 1993.

cartesian hessians.pdf

loriab commented 6 years ago

I'd like to, @psi-rking, but I really don't dare take up anything else until after ACS. Actually seeking the minimal stuff to do before sending pyvib2 to general review.

Thanks for the notes. It's a very user-knowledge view, but I'm not understanding your "don't have to do the extra displacements, vs. the FD_PROJECT keyword from Cfour. Different strategies?

psi-rking commented 6 years ago

No rush. I've put off doing this for at least 10 years :), and I'm swamped. I think I've convinced myself your scheme will work after all (since the derivative B matrix for a COM displacement coordinate is zero). However, one doesn't need 3N-3 gradients, because there are 3 rotational degrees for which the energy is invariant. How many programs like C4 exploit this I couldn't say. Ultimately, might be nice to have it either way. 3 more gradients, or else N_atoms^2 * small matrix operation.

-- Dr. Rollin A. King Professor and Chair of Chemistry Bethel University rking@bethel.edu

On Thu, Jan 18, 2018 at 5:08 PM, Lori A. Burns notifications@github.com wrote:

I'd like to, @psi-rking https://github.com/psi-rking, but I really don't dare take up anything else until after ACS. Actually seeking the minimal stuff to do before sending pyvib2 to general review.

Thanks for the notes. It's a very user-knowledge view, but I'm not understanding your "don't have to do the extra displacements, vs. the FD_PROJECT keyword from Cfour http://slater.chemie.uni-mainz.de/cfour/index.php?n=Main.ListOfKeywordsInAlphabeticalOrder. Different strategies?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/psi4/psi4/issues/884#issuecomment-358812718, or mute the thread https://github.com/notifications/unsubscribe-auth/ADguTFqGHSAeT4cqjgXz_oF86JmR_ccMks5tL87bgaJpZM4REtHr .

loriab commented 6 years ago

@psi-rking, my concern is your insistence that the rot SALCs aren't needed at non-stationary geometries ("3 rotational degrees for which the energy is invariant", "don't have to do the extra displacements") vs. Stanton's insistence that they are ("At a stationary point on the potential energy surface, both options will give equivalent harmonic force fields, but OFF [retains the rotational degrees of freedom] should be used at non-stationary points." Since pyvib2 has to start from Cartesian Hessians (findif or anal.), I'm just trying to gauge what situations the former is valid for.

psi-rking commented 6 years ago

I'm sure Stanton knows what he is talking about here, but I am not sure there is a discrepancy between what he is saying and what I am saying. Perhaps he never coded the minimal displacement approach, as we have never coded it either. He may have decided the robustness and simplicity of +2/3 gradients was better than the complexity and complication of solving all these little equations. I don't see a problem with you going ahead as you are planning for pyvib2. We could revisit to try to reduce the computational cost slightly at a later time.

On Fri, Jan 19, 2018 at 12:43 PM, Lori A. Burns notifications@github.com wrote:

@psi-rking https://github.com/psi-rking, my concern is your insistence that the rot SALCs aren't needed at non-stationary geometries ("3 rotational degrees for which the energy is invariant", "don't have to do the extra displacements") vs. Stanton's insistence that they are ("At a stationary point on the potential energy surface, both options will give equivalent harmonic force fields, but OFF [retains the rotational degrees of freedom] should be used at non-stationary points." Since pyvib2 has to start from Cartesian Hessians (findif or anal.), I'm just trying to gauge what situations the former is valid for.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/psi4/psi4/issues/884#issuecomment-359053844, or mute the thread https://github.com/notifications/unsubscribe-auth/ADguTPrMzI8UDltN7yLjOYyEfycGPrnAks5tMOJIgaJpZM4REtHr .

dgasmith commented 6 years ago

First step complete! Thanks @JonathonMisiewicz.

JonathonMisiewicz commented 6 years ago

Yep, which brings us (well, me) to the actual moving.

I'm planning on making a PR very early into the development process for this, because the interfacing requirements are murky, and I want to make sure that what I'm preparing is suitable for this "variety of reasons" we want this code moved. I'm assuming the limitations of the current interface will be clearer once I start playing with the code.

Since I don't have to wrap my head around pointers and references for the rest of this issue, I'm hoping progress will be a bit faster.

JonathonMisiewicz commented 6 years ago

I currently plan to combine the three geometry generators into one because their logic is so similar. Could you be more specific about the interface changes and desired new features after the C++ code is moved over? If combining the geometry generators is obviously going to break something down the road, best to find out now.

loriab commented 6 years ago

No immediate problems come to mind. In fact, combining them sounds like a good idea. The ability to do a different stencil besides 3 & 5 comes to mind as a desired new feature after the code is ported.

dgasmith commented 6 years ago

Id like to be able to have the result of the FD builder create label-based indices so that a gradient FD can be reused for a Hessian and to help support "workflow" level operations. I think a label-based way of doing this rather than index-based could help in its generality.

One way to do this is to have the FD building create a dict of geometries and take in a dict of energies to form the FD grad/Hess. This would be like {"GRAD_0_X": ..., "GRAD_0_Y": ... } where 0 is for the first atom.

Actually, @leeping do you have any thoughts on labeling parts of a FD Hessian? I think this may relate to our ForceBalance and database work.

leeping commented 6 years ago

@dgasmith I think that's a suitable way to store the data that goes into computing a FD gradient or Hessian. It may be good to have some metadata that stores the finite difference method and step size.

dgasmith commented 6 years ago

I agree, a bit of metadata describing the FD method (such as step and stencil) could be quite useful as well. We can think of a small structure like the following:

{
    "step": 0.05,
    "stencil": 3,
    ...anything else?
    "geometries": {
        "GRAD_0_X": ...,
        "GRAD_0_Y": ...,
       ...
    }
}

@JonathonMisiewicz Does this make sense? Happy to chat about this in more detail.

I still think porting this straight to Python side exactly as it currently is would still be great, think of this more of a stretch goal if you wish. Once it's in Python (a huge step already) we can always go back and add this to your Python FD code.

loriab commented 6 years ago

I think the bit o'metadata is an excellent suggestion. It should include the minimum info to re-set-up that calc in another program (that uses completely different options). Besides step and stencil, the items that come to mind are (1) units for step, (2) identity of step/space (here, SALC, but some programs use a normal coordinate) symmetry, (3) any translation/rotation inclusion flags.

JonathonMisiewicz commented 6 years ago

@dgasmith I understand, though my intuition is that the values in our geometries dict should be dicts themselves that store properties at that geometry in addition to the geometry. Definitely agreed on porting the C-side code and worrying about the metadata implementation afterwards. I'll get up a PR when I have the geometry generator ported and tested. The coding isn't bad, but my schedule is.

jturney commented 6 years ago

Should probably support step size per coordinate.

On Sat, Jan 27, 2018 at 10:56 AM Lori A. Burns notifications@github.com wrote:

I think the bit o'metadata is an excellent suggestion. It should include the minimum info to re-set-up that calc in another program (that uses completely different options). Besides step and stencil, the items that come to mind are (1) units for step, (2) identity of step/space (here, SALC, but some programs use a normal coordinate) symmetry, (3) any translation/rotation inclusion flags.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/psi4/psi4/issues/884#issuecomment-360993909, or mute the thread https://github.com/notifications/unsubscribe-auth/AADQjO7t-mK4xMgD3qbfhK0NzL_AXvVDks5tO0cjgaJpZM4REtHr .

-- Justin Turney, Ph.D. Senior Research Scientist CCQC/UGA

andysim commented 6 years ago

I like @jturney's idea. For internal coordinates, you could gain a lot by bumping up the step size along a flat torsion, for example. Of course the harmonic approximation is iffy in those cases, but you'd still get a better force field that way.

dgasmith commented 6 years ago

@JonathonMisiewicz +1 on just shifting this out Python side as a first step.

We should have a serializable form of a molecule soon based on the MolSSI QC Schema. It would probably be best to have each element look like the above schema. If I understand @loriab we should be able to serialize and deserialize into this format soon.

loriab commented 6 years ago

Yes, Molecule looks roughly like this (roughly b/c I've since killed off fragment_types). Works roundtrip among qcdb/psi4.core Molecules except for those that have been extract_subsetsed which I'm still testing. By the time pyvib2 is in, this will be pervasive in psi4 courtesy of the BasisSet handover from py-->C++, though not in the actual input-file molecule {...}.

psi-rking commented 6 years ago

Agree. But one would have to look at the finite difference formulas very carefully to see if you could still achieve the minimum number of displacements (including off-diagonals) with different step sizes. The ones in findif are from WDA, and some of them have fewer points than you would expect.

On Sat, Jan 27, 2018 at 11:51 AM, Andy Simmonett notifications@github.com wrote:

I like @jturney https://github.com/jturney's idea. For internal coordinates, you could gain a lot by bumping up the step size along a flat torsion, for example. Of course the harmonic approximation is iffy in those cases, but you'd still get a better force field that way.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/psi4/psi4/issues/884#issuecomment-361001743, or mute the thread https://github.com/notifications/unsubscribe-auth/ADguTFowehu-iJTpVqgGSmIDGi_eHdVvks5tO2IigaJpZM4REtHr .

JonathonMisiewicz commented 6 years ago

For future reference: Under the pyvib2 changes, a gradient will be computed during the course of any hessian() call, to confirm the geometry is stationary. The findif code just might benefit from reusing that.

(I discovered this when trying to figure out why the fd-psimrcc test was getting a findif gradient in addition to a findif hessian. Now Mk-MRCC is just taunting me.)

JonathonMisiewicz commented 6 years ago

See PR here for the geometry generator phase of the FINDIF phase-out.

JonathonMisiewicz commented 6 years ago

Also for future reference: Although we'd like FINDIF to be compatible with arbitrary molecules, Psi or QCDB, the current state of QCDB makes that difficult. To paraphrase Lori,

I think a findif test of a qcdb.Mol at the moment would use a qcdb.Mol in driver_findif.py to get geometries and to combine the energies/gradients into a gradient/hessian, but would need to transform into a psi.4.core.Mol for the quantum computations. In 1.3dev-qcdb, the transformation to a Psi molecule won't be necessary.

As such, I'll be holding off on the QCDB molecule tests until then.

JonathonMisiewicz commented 6 years ago

We can check off 2 through 4 with the merging of #1024 last night. The next step would be the move from lists to metadata dictionaries. I suspect there will be a lot of additional discussion about what should be in the metadata, so I'll make a PR early.

This won't be at the top of my priority list, but I expect no difficulties getting this done before WWDC in Nov.

loriab commented 6 years ago

Closed by the mugworthy #1024.