It would be helpful to provide an option to skip deduplication in the create_dataset() method of dataset factories. Use cases include datasets containing multiple instances of the same molecule with different constraints or different initial conformers applied to each instance.
The constructor for the ComponentResult class already has a flag skip_unique_checkhere that skips the initial deduplication. Propagating that flag to the create_dataset() methods and passing it to calls of the ComponentResult constructor would partially support this feature.
Currently, dataset factories hash molecules using their InChI key, so an alternative hash would need to be implemented to support this feature.
It would be helpful to provide an option to skip deduplication in the
create_dataset()
method of dataset factories. Use cases include datasets containing multiple instances of the same molecule with different constraints or different initial conformers applied to each instance.The constructor for the
ComponentResult
class already has a flagskip_unique_check
here that skips the initial deduplication. Propagating that flag to thecreate_dataset()
methods and passing it to calls of theComponentResult
constructor would partially support this feature.Currently, dataset factories hash molecules using their InChI key, so an alternative hash would need to be implemented to support this feature.