Closed peastman closed 3 years ago
I guess it's up to me to start things off. Here are some terrible ideas for names.
Now someone needs to suggest some good names. Otherwise, we'll be stuck with one of my bad ones!
OpenMM-QMdataset-v1
On Fri, Oct 1, 2021 at 4:09 AM Peter Eastman @.***> wrote:
I guess it's up to me to start things off. Here are some terrible ideas for names.
- Piña Colada (an acronym for "Piña colada Is Not A Classical or Limited-Applicability DAtaset)
- Spice (because it contains lots of variety. Also totally not a Dune reference.)
- Kvasir https://en.wikipedia.org/wiki/Kvasir
- Growf https://comicvine.gamespot.com/growf-the-dragon/4005-86175/
Now someone needs to suggest some good names. Otherwise, we'll be stuck with one of my bad ones!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/openmm/qmdataset/issues/9#issuecomment-931838417, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3KUORNWGGGFGBF3KGKCH3UEUJ5TANCNFSM5EUGSP2A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Very catchy!
We also could capitalize SPICE and pretend it's an acronym for "Small-molecule/Protein Interaction Chemical Energies". I bet people would even believe it.
Does anyone else have thoughts on this?
@pavankum, is deciding on a name blocking you from setting up the calculations? If so, we really need to come up with something.
@peastman I am still waiting on a release from qcfractal which resolves this, the fix is in there. Also, a new qcsubmit release with the new hdf5 functionality. Once they're merged I can work on pushing the submission PR.
Ok, good to know. Thanks.
But we still need to pick a name!
We'll need to take a few stabs at this, so I'd suggest keeping in mind we'll have version numbers and different compositional variants.
Longer descriptive names are also most useful for these QM datasets. The ML models we build from them can have catchier names.
Each thematically-related subset should also likely have its own name, such as
OpenMM - DES370K dimers - v1
OpenMM - dipeptides - v1
OpenMM - solvated amino acids - v1
By the way, didn't we want to add the monomers from the DES370K set as well? I don't think we've done that yet, but this will be critical for building good models, won't it?
Happy with the above names as a reference to the overall dataset, however. But would be useful to think of branding, and hence, google-ability.
Surprisingly, "OpenQM" has only ~6000 hits on Google, which is as close to non-existence as you can get nowadays. Searching for "OpenMM OpenQM" (how people would find us) gives nothing.
Longer descriptive names are also most useful for these QM datasets.
You mean like ANI or QM9? :)
Each thematically-related subset should also likely have its own name
Agreed. We need to pick a name for the full dataset, which will be part of the names of all the subsets. I'd prefer not to include "OpenMM" in the name, because this dataset isn't really specific to OpenMM in any way. That would create confusion. It's just a large collection of QM calculations for molecules that can be used for many purposes.
Another thing that creates confusion is using the same name both for a dataset and a model. For example, "ANI" can refer to a dataset, a model architecture, and a particular pretrained instance of that architecture. Very confusing. Likewise "OrbNet Denali" refers both to a dataset and to a specific model trained on that databset.
Longer descriptive names are also most useful for these QM datasets.
You mean like ANI or QM9? :)
No. Those are terrible names for datasets. In addition, ANI is the name for the dataset and the model, further adding to confusion.
Each thematically-related subset should also likely have its own name
Agreed. We need to pick a name for the full dataset, which will be part of the names of all the subsets. I'd prefer not to include "OpenMM" in the name, because this dataset isn't really specific to OpenMM in any way. That would create confusion. It's just a large collection of QM calculations for molecules that can be used for many purposes.
Is there a reason OpenQM is a poor choice? Seems like the obvious given the orgs involved (OpenMM, OpenFF, QCArchive).
Another thing that creates confusion is using the same name both for a dataset and a model. For example, "ANI" can refer to a dataset, a model architecture, and a particular pretrained instance of that architecture. Very confusing. Likewise "OrbNet Denali" refers both to a dataset and to a specific model trained on that databset.
At least we agree on this!
Is there a reason OpenQM is a poor choice?
It would be a good name for a piece of software that does QM calculations. For a database of chemical energies, I'm not as sure. A name like "ANI" doesn't suggest any meaning at all, so you can use it for anything. A name like "OpenQM" does suggest a meaning. When people see the name, will they assume it refers to something different from what it actually is?
It's still better than Piña Colada though. :)
@pavankum has declared that SPICE is the name! That means we can close this.
We should come up with a catchy name. Make your suggestions, the more the better, and don't be afraid to be silly!