Closed aclum closed 1 week ago
This code should be written in as generic a way as possible. Our metadata should be a superset of the NCBI fields. There should not even be any mapping involved since we use the mixs slots. There is standard generic flattening of object fields (e.g. {value} {unit}
).
One decision needs to be made: MIxS loosely recommends units like "centimeters" which is highly non-standard. I proposed using UCUM in 2021 https://github.com/GenomicsStandardsConsortium/mixs/issues/154 but as far as I know this has never been discussed by the CIG or the board.
In NMDC we are moving towards unit symbols/UCUM.
Note that most data that is in NCBI biosample uses unit symbols. I propose that we do not do some kind of awkward expansion, and that we simply submit "5 m"
and hope that MIxS catches up.
@cmungall the data in mongo is heterogeneous currently wrt units. I don't want this work to be blocked on UCUM adoption since that is months away. Is your proposal that the export code handle converting to UCUM or that we submit units as they are currently in the schema or something else?
Correct, I propose that we do not do any kind of awkward expansion, and submit as-is. This means that strictly speaking we are going against MIxS guidelines but hopefully this is temporary.
From the list of classes mentioned in the issue description - Biosample, Extraction, LibraryPreparation, OmicsProcessing classes, DataObject, we see attributes from Biosample being mapped to XML attributes in <BioSample>
, attributes from DataObject being mapped to XML attributes in <AddFiles>
and in addition attributes from Study being mapped to XML attributes in <BioProject>
(in submission.xml)
What would attributes from the lab processing classes/slots map to?
In progress, moving to the next sprint.
Checkpoint for squad meeting on 5/7: NMDC object/NCBI submission.xml mappings identified, and start of dagster harness set up. We have mappings ready to produce
Actively in progress, moving to the next sprint.
@chienchi @sujaypatil96 @aclum is this still actively being worked on? Any updates?
Removing from sprint, no updates in 2 weeks, no response
@ssarrafan I am actively working on developing the code for this issue, could we add this to the next sprint board please? There were a couple of blockers which needed some conversations, but work is being pushed up very actively on the linked PR.
Active, moving to the next sprint.
The goal is to develop an ETL script to convert NMDC submissions to NCBI submissions using version 6.0 MIxS packages. We will start by developing support for the following packages:
tasks code should accomplish:
cc @sujaypatil96 @chienchi