waldronlab / curatedMetagenomicDataCuration

Sample Metadata Curation for curatedMetagenomicData
https://waldronlab.io/curatedMetagenomicDataCuration/
28 stars 23 forks source link

uncurated_author_metadata #15

Closed schifferl closed 2 years ago

schifferl commented 6 years ago

From @lwaldron on February 3, 2017 17:46

It is good to have the original, uncurated metadata on hand, to check for variables that weren't included in the curation, or to check for curation errors. In the past I have put this in a final column "uncurated_author_metadata" with entries the following format, and made a function for splitting this into its own dataframe:

colname1: value///colname2: value///colname3: value

This also makes curation more manageable, as you can focus on the more commonly recurring variables, without worrying that you are losing the less common variables. @edoardopasolli and Paolo (don't have your ID yet), would you consider adding this column as the last column of the curated metadata?

Copied from original issue: waldronlab/curatedMetagenomicData#58

schifferl commented 6 years ago

From @lwaldron on February 3, 2017 17:52

That's the syntax I've used, but looking at it now I think this would be better just in case anyone used a single colon in their column header...

colname1:::value///colname2:::value///colname3:::value

lwaldron commented 6 years ago

@edoardopasolli @paolinomanghi I understand from you that this may not be practical as proposed, but there should be some sort of provenance to the original, uncurated metadata. Otherwise the assumption will be that it all comes from SRA. Even just a link to collected uncurated files collected in one place would be OK, better than nothing.

paolinomanghi commented 3 years ago

Hi, I can add the raw metadata for most of the datasets, even if not for all. A major point besides is that many effective raw-metadata tables come from a paper which is not the one of the dataset. So, if I add these tables, there will be some sort of "manual handling" also in those. Shall we proceed?

lwaldron commented 2 years ago

This will be handled in the new metadata database. FYI @QuanWan89