openmm / spice-dataset

A collection of QM data for training potential functions
MIT License
147 stars 9 forks source link

openff-default spec downloader #51

Closed pavankum closed 1 year ago

pavankum commented 1 year ago

Placeholder PR for openff-default spec downloader for @jchodera and @yuanqing-wang 's work that takes care of functional/dispersion split. Thanks to @kntkb for sharing some of his code.

I would not merge this to avoid confusion with the better QM spec downloader.

peastman commented 1 year ago

I don't think it's a good idea to add this. As discussed in #39, the lower level of theory should never be referred to as "SPICE", and it should be hidden from anyone who doesn't specifically know to look for it. It's ok if you want to create a downloader in some other repository, but again only as long as "SPICE" doesn't appear in the name of either the repository or the dataset.

pavankum commented 1 year ago

@peastman Sorry about that, sure I will just pass the file to whoever needs it.

jchodera commented 1 year ago

@pavankum : Thanks so much! This is super helpful, since we are interested in using SPICE + the existing OpenFF datasets in various combinations to test various hypotheses.

@peastman : Your concern is that someone should never grab the openff-spec level of theory by accident.

How about this: I can refactor this so it's an option that you have to specify (e.g. you have to provide the CLI with --spec openff-default). There is no way someone could accidentally use that flag, and when they do, we can print out a WARNING message that the non-SPICE level of theory is being used.

Surely that's sufficient?

peastman commented 1 year ago

That wouldn't address my concerns. The alternate calculations are not the SPICE dataset. We need to be very clear about that. It's a totally independent dataset that just happens to use the same conformations. In the same way we reused the DES370K conformations, but we would never describe our dataset as "a version of DES370K". It's an independent dataset with a different name that's stored in a different place.