openforcefield / openff-qcsubmit

Automated tools for submitting molecules to QCFractal
https://openff-qcsubmit.readthedocs.io/en/latest/index.html
MIT License
26 stars 4 forks source link

Molecule submission checklist #9

Open jthorton opened 4 years ago

jthorton commented 4 years ago

ISSUE What types of issues do we want to check molecules for before submitting as part of a dataset? Maybe it would be best to separate these into a major and minor with major issues like undefined stereochemistry stopping the molecule from being submitted and minor still being allowed through?

We should also identify how we can perform these checks which may be helpful for this toolkit issue

Checks

MAJOR

MINOR

jchodera commented 4 years ago

There are two philosophies we could adopt:

The strict mode is faster, but much more annoying. The permissive mode is slower, but will "just work" most of the time.

If we're building a tool we want our pharma partners to have good experience with---especially if we're starting with ANI-1ccx or ANI-2---we may want to consider the permissive mode by default and a strict mode if we don't want to do the default things.

On the other hand, if we want to explicitly specify pipeline stages in the YAML that describe how to expand protonation/tautomeric states, we may want to have the minimal (strict) mode be the default and allow users to install a YAML file that does a sensible protonation/tautomeric state expansion and stereochemistry enumeration that they can later tweak.

jthorton commented 4 years ago

I agree I don't want users to find no molecules have been submitted for calculation as they all contain issues but didn't want to explicitly code this in as part of the workflow. Currently, each of these options is a component which can be put into a workflow which can be specified into a YAML file. See here for an example with a conformer generation component and its options. Doing this way gives us the option to have predefined openff workflows which can be defined here and I imagined them being imported depending on the factory being used. Maybe it would be best to have the default module for each factory take the permissive route described above and automatically be populated into a factory with some special init method.

The current design

# build the work flow manually
from qcsubmit import workflow_components
from qcsubmit.factories import OptimizationDatasetFactory
opt_factory = OptimizationDatasetFactory()
# tautomer enumeration
tauts = workflow_components.EnumerateTautomers(max_tautomers=10)
opt_factory. add_workflow_component(tauts)
# stereochemistry
stereo = workflow_components.EnumerateStereoisomers(undefined_only=False, max_isomers=10)
opt_factory.add_workflow_component(stereo)

# or use a predefined workflow
from qcsubmit import workflow_modules
opt_factory = OptimizationDatasetFactory()
opt_factory.add_workflow_component(workflow_modules.OpenffOptimizationWorkflow)

Or add a new init method that could do this could work like this:

full_opt_factory = OptimizationDatasetFactory.from_module(workflow_modules.OpenffOptimizationWorkflow)

or this could just become the default __init__ method and instead, we can have a special function which creates a blank workflow allowing for the strict route.