openforcefield / openff-qcsubmit

Automated tools for submitting molecules to QCFractal
https://openff-qcsubmit.readthedocs.io/en/latest/index.html
MIT License
26 stars 4 forks source link

Support skipping deduplication with dataset factories #179

Open chapincavender opened 2 years ago

chapincavender commented 2 years ago

It would be helpful to provide an option to skip deduplication in the create_dataset() method of dataset factories. Use cases include datasets containing multiple instances of the same molecule with different constraints or different initial conformers applied to each instance.

The constructor for the ComponentResult class already has a flag skip_unique_check here that skips the initial deduplication. Propagating that flag to the create_dataset() methods and passing it to calls of the ComponentResult constructor would partially support this feature.

Currently, dataset factories hash molecules using their InChI key, so an alternative hash would need to be implemented to support this feature.