openforcefield / openff-qcsubmit

Automated tools for submitting molecules to QCFractal
https://openff-qcsubmit.readthedocs.io/en/latest/index.html
MIT License
26 stars 4 forks source link

store_wavefunction not propagated #244

Open lilyminium opened 7 months ago

lilyminium commented 7 months ago

When creating an OptimizationDataset, I would expect store_wavefunction to get propagated into the generated QCSpecification. However, I don't think this is currently happening -- should it be?

Code

from openff.qcsubmit.datasets import OptimizationDataset, BasicDataset
from qcelemental.models.results import WavefunctionProtocolEnum

dataset = OptimizationDataset(
    dataset_name="OpenFF multi-Br ESP Fragment Conformers v1.0",
    dataset_tagline="HF/6-31G* conformers of diverse fragment molecules with multiple Br.",
    description=(
        "A dataset containing molecules from the "
        "`OpenFF ESP Fragment Conformers v1.0` with modifications. "
        "Molecules with multiple Cl atoms were selected from the "
        "dataset and the Cl atoms were replaced with Br.\n\n"
        "For each molecule, a set of up to 5 conformers were generated by:\n"
        "  * generating a set of up to 1000 conformers with a RMS cutoff of 0.5 Å "
        "using the OpenEye backend of the OpenFF toolkit\n"
        "  * applying ELF conformer selection (max 5 conformers) using OpenEye\n\n"
        "Each conformer will be converged according to the 'GAU_LOOSE' criteria."
    ),
)
dataset.clear_qcspecs()
dataset.add_qc_spec(
    program="psi4",
    method="hf",
    basis="6-31G*",
    spec_name="HF/6-31G*",
    spec_description="The standard HF/6-31G* basis used to derive RESP style charges.",
    store_wavefunction=WavefunctionProtocolEnum.orbitals_and_eigenvalues
)
dataset._get_specifications()
{'HF/6-31G*': OptimizationSpecification(program='geometric', qc_specification=QCSpecification(program='psi4', driver=<SinglepointDriver.deferred: 'deferred'>, method='hf', basis='6-31g*', keywords={'maxiter': 200, 'scf_properties': ['dipole', 'quadrupole', 'wiberg_lowdin_indices', 'mayer_indices']}, protocols=AtomicResultProtocols(wavefunction=<WavefunctionProtocolEnum.none: 'none'>, stdout=True, error_correction=ErrorCorrectionProtocol(default_policy=True, policies=None), native_files=<NativeFilesProtocolEnum.none: 'none'>)), keywords={'coordsys': 'dlc', 'enforce': 0.0, 'epsilon': 1e-05, 'reset': True, 'qccnv': False, 'molcnv': False, 'check': 0, 'trust': 0.1, 'tmax': 0.3, 'maxiter': 300, 'convergence_set': 'GAU'}, protocols=OptimizationProtocols(trajectory=<TrajectoryProtocolEnum.all: 'all'>))}

Looking at the qc_specifications they do seem to be there:

dataset.qc_specifications["HF/6-31G*"].store_wavefunction
<WavefunctionProtocolEnum.orbitals_and_eigenvalues: 'orbitals_and_eigenvalues'>

It also works for a BasicDataset:

dataset = BasicDataset(
    dataset_name="OpenFF multi-Br ESP Fragment Conformers v1.0",
    dataset_tagline="HF/6-31G* conformers of diverse fragment molecules with multiple Br.",
    description=(
        "A dataset containing molecules from the "
        "`OpenFF ESP Fragment Conformers v1.0` with modifications. "
        "Molecules with multiple Cl atoms were selected from the "
        "dataset and the Cl atoms were replaced with Br.\n\n"
        "For each molecule, a set of up to 5 conformers were generated by:\n"
        "  * generating a set of up to 1000 conformers with a RMS cutoff of 0.5 Å "
        "using the OpenEye backend of the OpenFF toolkit\n"
        "  * applying ELF conformer selection (max 5 conformers) using OpenEye\n\n"
        "Each conformer will be converged according to the 'GAU_LOOSE' criteria."
    ),
)
dataset.clear_qcspecs()
dataset.add_qc_spec(
    program="psi4",
    method="hf",
    basis="6-31G*",
    spec_name="HF/6-31G*",
    spec_description="The standard HF/6-31G* basis used to derive RESP style charges.",
    store_wavefunction=WavefunctionProtocolEnum.orbitals_and_eigenvalues
)
dataset._get_specifications()
{'HF/6-31G*': QCSpecification(program='psi4', driver=<SinglepointDriver.energy: 'energy'>, method='hf', basis='6-31g*', keywords={'maxiter': 200, 'scf_properties': ['dipole', 'quadrupole', 'wiberg_lowdin_indices', 'mayer_indices']}, protocols=AtomicResultProtocols(wavefunction=<WavefunctionProtocolEnum.orbitals_and_eigenvalues: 'orbitals_and_eigenvalues'>, stdout=True, error_correction=ErrorCorrectionProtocol(default_policy=True, policies=None), native_files=<NativeFilesProtocolEnum.none: 'none'>))}

(cc https://github.com/openforcefield/qca-dataset-submission/pull/335#issuecomment-1835378295)

Version: 0.50.1

jthorton commented 7 months ago

Hi @lilyminium good spot, I see that we try to propagate it to qcfractal here when making the OptimizationSpecification but it seems to get lost along the way. With some digging in qcfractal I found that the OptimizationSpecification forces the use of the default single-point protocol which is removing our wavefunction settings. My guess is to stop the wavefunction from being saved for an entire trajectory but we could have this restriction removed if we pass a protocol which only saves the final result from an optimization. cc @bennybp @dotsdl

bennybp commented 7 months ago

QCElemental right now doesn't support protocols for the gradient calculations. The structure in QCElemental is a bit different than QCFractal: QCFractal reuses a full QCSpecification object, but QCElemental has its own, which unfortunately doesn't have protocols: https://github.com/MolSSI/QCElemental/blob/e942b810f1681b7d38209d0fed55b49954e6e4b5/qcelemental/models/procedures.py#L56

So I fix the protocols to be the default to at least maintain consistency in the DB (I should probably add a comment there). This was a huge pain during the migration from the legacy version to the current version.

This is part of a QCElemental refactor which has been a long time coming, unfortunately.

j-wags commented 7 months ago

Since I'm taking over maintenance of QCSubmit, I took some time to wrap my head around this. Here are notes, maybe just for myself, though I'd really appreciate correction if I'm misunderstanding something:

So I think the right solution is that the force_qcspec validator should either warn or raise an error when a protocol is overwritten. Does that sound good to you @bennybp? If not, we can have QCSubmit raise an error in that situation instead.

bennybp commented 7 months ago

Largely correct, with a few more clarifications:

* So it IS possible to get wavefunctions for these evaluations currently, but we'd need to submit a new BasicDataset using the final molecules from the existing optimization dataset's results. (and now that I look, @lilyminium has done exactly that [here](https://github.com/openforcefield/qca-dataset-submission/pull/341))

I vaguely plan (with no timetable) to have this built in, since it will be very common: https://github.com/MolSSI/QCFractal/issues/748

Another option is to overhaul the qcelemental models to take in two specifications, but that would be a bit more disruptive.

* In new QCF, the `driver` value for  optimizations has changed from `gradient` to `deferred`. This change in wording makes me think that there are plans for datasets that say "here is a QC spec, but DEFER evaluating it until the gradient evaluations are done". I didn't dig far into this, but I'm optimistic that this is the meaning.

I wouldn't be so optimistic :)

That value pertains only to the driver (which is typically energy, gradient, hessian). There's an awkwardness when we allow that to be set by the user for an optimization, because it is largely ignored - whether an energy, gradient, or hessian is run is controlled by the optimization software, not the user. So what would it mean to submit an optimization with driver = energy when geometric is just going to do gradients anyway. So the (admittedly hacky) idea was to make this deferred value for the enum - the overall type of calculation is being deferred to the optimization software.

* @bennybp's post just above this one reiterates that it wasn't and isn't possible to request wavefunctions in optimizations. I'm not sure if "This is part of a QCElemental refactor which has been a long time coming, unfortunately." means "unfortunately it will continue NOT being possible after a refactor that has/will happen" or "it will be possible after the refactor, which unfortunately will take a very long time". Could you clarify?

It's been on the table for a good while now, along with some qcelemental/qcschema splitting ideas. They just end up taking a back seat to other pressing issues.

So I think the right solution is that the force_qcspec validator should either warn or raise an error when a protocol is overwritten. Does that sound good to you @bennybp? If not, we can have QCSubmit raise an error in that situation instead.

I agree. The current behavior is technically correct but hidden. So raising an exception is better than silently changing their input.

j-wags commented 7 months ago

Awesome. Thanks for the thorough response. This makes a lot of sense, and I think it'd be great to put an exception in QCFractal when protocols are overwritten :-)