Name for this dataset - Githubissues

peastman commented 3 years ago

We should come up with a catchy name. Make your suggestions, the more the better, and don't be afraid to be silly!

peastman commented 3 years ago

I guess it's up to me to start things off. Here are some terrible ideas for names.

Piña Colada (an acronym for "Piña colada Is Not A Classical or Limited-Applicability DAtaset)
Spice (because it contains lots of variety. Also totally not a Dune reference.)
Kvasir
Growf

Now someone needs to suggest some good names. Otherwise, we'll be stuck with one of my bad ones!

giadefa commented 3 years ago

OpenMM-QMdataset-v1

On Fri, Oct 1, 2021 at 4:09 AM Peter Eastman @.***> wrote:

I guess it's up to me to start things off. Here are some terrible ideas for names.

Piña Colada (an acronym for "Piña colada Is Not A Classical or Limited-Applicability DAtaset)

Spice (because it contains lots of variety. Also totally not a Dune reference.)

Kvasir https://en.wikipedia.org/wiki/Kvasir

Growf https://comicvine.gamespot.com/growf-the-dragon/4005-86175/

Now someone needs to suggest some good names. Otherwise, we'll be stuck with one of my bad ones!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/openmm/qmdataset/issues/9#issuecomment-931838417, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3KUORNWGGGFGBF3KGKCH3UEUJ5TANCNFSM5EUGSP2A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

peastman commented 3 years ago

Very catchy!

peastman commented 3 years ago

We also could capitalize SPICE and pretend it's an acronym for "Small-molecule/Protein Interaction Chemical Energies". I bet people would even believe it.

peastman commented 3 years ago

Does anyone else have thoughts on this?

@pavankum, is deciding on a name blocking you from setting up the calculations? If so, we really need to come up with something.

pavankum commented 3 years ago

@peastman I am still waiting on a release from qcfractal which resolves this, the fix is in there. Also, a new qcsubmit release with the new hdf5 functionality. Once they're merged I can work on pushing the submission PR.

peastman commented 3 years ago

Ok, good to know. Thanks.

But we still need to pick a name!

jchodera commented 3 years ago

We'll need to take a few stabs at this, so I'd suggest keeping in mind we'll have version numbers and different compositional variants.

Longer descriptive names are also most useful for these QM datasets. The ML models we build from them can have catchier names.

Each thematically-related subset should also likely have its own name, such as

OpenMM - DES370K dimers - v1
OpenMM - dipeptides - v1
OpenMM - solvated amino acids - v1

By the way, didn't we want to add the monomers from the DES370K set as well? I don't think we've done that yet, but this will be critical for building good models, won't it?

jchodera commented 3 years ago

Happy with the above names as a reference to the overall dataset, however. But would be useful to think of branding, and hence, google-ability.

Surprisingly, "OpenQM" has only ~6000 hits on Google, which is as close to non-existence as you can get nowadays. Searching for "OpenMM OpenQM" (how people would find us) gives nothing.

peastman commented 3 years ago

Longer descriptive names are also most useful for these QM datasets.

You mean like ANI or QM9? :)

Each thematically-related subset should also likely have its own name

Agreed. We need to pick a name for the full dataset, which will be part of the names of all the subsets. I'd prefer not to include "OpenMM" in the name, because this dataset isn't really specific to OpenMM in any way. That would create confusion. It's just a large collection of QM calculations for molecules that can be used for many purposes.

Another thing that creates confusion is using the same name both for a dataset and a model. For example, "ANI" can refer to a dataset, a model architecture, and a particular pretrained instance of that architecture. Very confusing. Likewise "OrbNet Denali" refers both to a dataset and to a specific model trained on that databset.

jchodera commented 3 years ago

Longer descriptive names are also most useful for these QM datasets.

You mean like ANI or QM9? :)

No. Those are terrible names for datasets. In addition, ANI is the name for the dataset and the model, further adding to confusion.

Each thematically-related subset should also likely have its own name

Agreed. We need to pick a name for the full dataset, which will be part of the names of all the subsets. I'd prefer not to include "OpenMM" in the name, because this dataset isn't really specific to OpenMM in any way. That would create confusion. It's just a large collection of QM calculations for molecules that can be used for many purposes.

Is there a reason OpenQM is a poor choice? Seems like the obvious given the orgs involved (OpenMM, OpenFF, QCArchive).

Another thing that creates confusion is using the same name both for a dataset and a model. For example, "ANI" can refer to a dataset, a model architecture, and a particular pretrained instance of that architecture. Very confusing. Likewise "OrbNet Denali" refers both to a dataset and to a specific model trained on that databset.

At least we agree on this!

peastman commented 3 years ago

Is there a reason OpenQM is a poor choice?

It would be a good name for a piece of software that does QM calculations. For a database of chemical energies, I'm not as sure. A name like "ANI" doesn't suggest any meaning at all, so you can use it for anything. A name like "OpenQM" does suggest a meaning. When people see the name, will they assume it refers to something different from what it actually is?

It's still better than Piña Colada though. :)

peastman commented 3 years ago

@pavankum has declared that SPICE is the name! That means we can close this.

openmm / spice-dataset

Name for this dataset #9