Closed schifferl closed 2 years ago
From @lwaldron on February 3, 2017 17:52
That's the syntax I've used, but looking at it now I think this would be better just in case anyone used a single colon in their column header...
colname1:::value///colname2:::value///colname3:::value
@edoardopasolli @paolinomanghi I understand from you that this may not be practical as proposed, but there should be some sort of provenance to the original, uncurated metadata. Otherwise the assumption will be that it all comes from SRA. Even just a link to collected uncurated files collected in one place would be OK, better than nothing.
Hi, I can add the raw metadata for most of the datasets, even if not for all. A major point besides is that many effective raw-metadata tables come from a paper which is not the one of the dataset. So, if I add these tables, there will be some sort of "manual handling" also in those. Shall we proceed?
This will be handled in the new metadata database. FYI @QuanWan89
From @lwaldron on February 3, 2017 17:46
It is good to have the original, uncurated metadata on hand, to check for variables that weren't included in the curation, or to check for curation errors. In the past I have put this in a final column "uncurated_author_metadata" with entries the following format, and made a function for splitting this into its own dataframe:
colname1: value///colname2: value///colname3: value
This also makes curation more manageable, as you can focus on the more commonly recurring variables, without worrying that you are losing the less common variables. @edoardopasolli and Paolo (don't have your ID yet), would you consider adding this column as the last column of the curated metadata?
Copied from original issue: waldronlab/curatedMetagenomicData#58