openforcefield / openff-qcsubmit

Automated tools for submitting molecules to QCFractal
https://openff-qcsubmit.readthedocs.io/en/latest/index.html
MIT License
26 stars 4 forks source link

Split MMSpec and MLSpec from QCSpec #152

Open jthorton opened 3 years ago

jthorton commented 3 years ago

To allow for more specific validation of calculation keywords and settings it would be good to split the current QCSpec class into 3 separate objects with their own validation. QCSpec for QM calculations MMSpec for MM calculations and MLSpec for ML calculations.

jthorton commented 5 months ago

Thinking about this some more I think it might be better to have a unique spec for each supported program to allow for very specific validation of inputs for example psi4 supports calculating MBIS charges where as other QM programs like pySCF or ORCA don't and they each have different implicit solvation models. I am not sure how backwards compatible this would be with old datasets this will need testing or some converter may need to be implemented.

These models will also be needed in bespokefit, one issue is that bespokefit does not need the spec_name and spec_description fields which qcarchive does which might complicate creating a shared model which can be used by both. It might work to split the specs to have a base spec which handles the validation of the program inputs and then for qcsubmit to wrap this in a new class composed of the name description and the program spec, something like

# Psi4 base spec with validation
class Psi4Spec(_BaseSpec):
    program: Literal["psi4"]
    method: str
    basis: Optional[str] = None
    implicit_solvent: Optiona[Union[PCMSettings, DDXSettings]] = None
    scf_properties: List[str] 
    ....

class NammedSpec(_BaseSpec):
     spec_name: str
     spec_description: str
     spec: Union[Psi4Spec, ANISpec, GFNSpec, SmirnoffSpec ...]
     ...
mattwthompson commented 5 months ago

How many programs would this be? (Not shooting the idea down, just need to be reminded what "program" means in this context)

jthorton commented 5 months ago

Yeah that's the main issue is that it could quickly grow out of control we currently have validation for 4 (Psi4, Torchani, xTB, OpenMM). Here the program refers to the engine that does the calculation so there is a massive range of possibilities for example qcengine will soon support MACE and AIMNET2 which we also want to be able to use.

j-wags commented 5 months ago

Without knowing much about the specifics of this ask, when I was refactoring qcsubmit a lot of the pain was caused by the fact that many QCS objects were just wrappers around QCF/QCEl objects. This meant that, when the QCF/QCEl objects changed, the QCS wrappers needed to change in lockstep.

So I'm nervous about this potentially putting more layers of complexity/behavior between the QCSubmit API and the underlying QCEl interface. Like, if we helpfully validated user input and raised a validation error if they tried to ask ORCA for MBIS charge, and then one day ORCA added support for MBIS charges, we'd need to hustle to make a release to get the validation up to date. It'd be this way for every feature that we wrap/validate.

Is there a clear sweet spot for user experience versus maintenance cost for this sort of thing?

jthorton commented 4 months ago

Like, if we helpfully validated user input and raised a validation error if they tried to ask ORCA for MBIS charge, and then one day ORCA added support for MBIS charges, we'd need to hustle to make a release to get the validation up to date.

Yeah that was always the plan with qcsubmit to have strict validation on the programs we are interested in and then everything else we just let through see here and even then it's only for a subset of features which we often use. Ideally each program would tell use what features are available before hand but outside of that I am not sure I see many other options but open to ideas as I would like to get this or someting similar working for bespokefit.