openforcefield / qca-dataset-submission

Data generation and submission scripts for the QCArchive ecosystem.
Other
32 stars 6 forks source link

What metadata should we be including inside our QCFractal submissions? #54

Open jchodera opened 5 years ago

jchodera commented 5 years ago

It would be helpful to include more metadata describing the construction of our datasets. Our input directories contain helpful blocks like this:

### General Information

 - Date: 2019-07-21
 - Class: Forcefield Parametrization
 - Purpose: Explore discrepancies between QM and OPLS3e
 - Collection: OptimizationDataset
 - Name: Pfizer discrepancy optimization dataset 1
 - Number of Entries: 100 unique molecules, XXX conformers
 - Submitter: John Chodera

Should we also include this information in metadata for submission? If so, where should we put it?

dgasmith commented 5 years ago

A few other items:

jchodera commented 5 years ago

"tagline" would also be very useful. A single sentence of ~1-200 characters that describes the dataset. This could be the "Purpose" field above, but may be different.

Can you give some examples?

jchodera commented 5 years ago

What about a description field that allows a more detailed description? (Perhaps that is more sensible than tagline?)

dgasmith commented 5 years ago

I believe we would ultimately like both. tagline or similar is useful for when you are presented with a dozen datasets where the name is not sufficiently informative. A description field could be a few paragraphs of additional information.

A few examples:

It might be that quantum chemist have arcane names, but it seems to be useful.

jchodera commented 5 years ago

OK, so what is the complete set of metadata entries we want to include so far?

dgasmith commented 5 years ago

tag, tagline, description?

jchodera commented 5 years ago

Sorry, should have been clearer: What's the full set of metadata we should manually specify then? Presumably, this includes things like a dataset name, levels of theory, person generating the data, contact info, etc.

davidlmobley commented 5 years ago

You perhaps also want a "Source URL (if applicable)" tag or something like that, e.g. for stuff in this repo we'd link to where it came from. This also might help encourage people to... provide links to source materials.