openforcefield / openff-qcsubmit

Automated tools for submitting molecules to QCFractal
https://openff-qcsubmit.readthedocs.io/en/latest/index.html
MIT License
26 stars 4 forks source link

Switch to using QCElemental `Molecule.identifiers` for CMILES and friends? #199

Open dotsdl opened 2 years ago

dotsdl commented 2 years ago

Historically we used qcelemental.models.Molecule.extras for storing e.g. canonical_isomeric_explicit_hydrogen_mapped_smiles, but we may wish to start using qcelemental.models.Molecule.identifier for this information in addition, and eventually, instead.

Any objections to this?

jthorton commented 2 years ago

@dotsdl good idea the identifiers here seem to support most of our needs, it would be good if we could extend it though to support all of the attributes we currently use, as we also use the fixed hydrogen layer inchi and inchikey which allows us to distinguish between tautomers when performing searches on our results collections see all of our used attributes here.

We also use the hill formula as our molecular formula it would be good to clarify what is supposed to be used in the identifier molecular_formula field as I see they support two types of formula as I belive you can search qcarchive using this formula so it would be good to be consistent.

It might also be good to suggest that the molecule_hash is generated during the init of the molecule as I think its supposed to be the hash from this function but currently would allow any string to be used.

Maybe we should also suggest a provenance field so we can supply the toolkit used and the version which made the identifiers.