Open mslarae13 opened 9 months ago
Appears to be in progress and active. I'll move this to the next sprint @bmeluch @mslarae13
@JamesTessmer @sujaypatil96 here is the issue for tracking work on the 1000 soils metagenomics OmicsProcessing records and getting the data into workflows, etc. Thanks so much for working on this!
@bmeluch @lamccue Do we know anything about the instrument model for the data from Genewiz? Its not required by the schema but it is a data portal faceted search option so would be nice to have this for the omics records.
I'm not entirely sure.
Illumina definitely.
More specifically: maybe Illumina NovaSeq
Discussed in a quick meeting
"omics_type": {
"has_raw_value": "Metagenome"
"part_of": ["nmdc:sty-11-28tm5d36"]
"type": "nmdc:OmicsProcessing"
name == biosample name (edited)
"instrument_name": "Illumina NovaSeq S4"
"principal_investigator": {
"email": "nancy.hess@pnnl.gov",
"has_raw_value": "Nancy Hess",
"name": "Nancy Hess"
name == biosample name
id: nmdc:omprc-11-123
has_input:
- nmdc:bsm-11-yj9yav68
has_output:
- nmdc:dobj-11-123
- nmdc:dobj-11-124
processing_institution
will be left empty until this issue us resolved: https://github.com/microbiomedata/issues/issues/634
@sujaypatil96 & @JamesTessmer have created the -omics processing records. Workflows should be ready to be run for both the JGI and Genewiz (Azenta) data.
@aclum , @mbthornton-lbl do you need anything else?
No we should be good to go. Michael has already started on getting the JGI data staged.
@mslarae13 @bmeluch can this issue be closed based on Alicia's last comment?
I'm not sure, would have to ask @aclum or @mbthornton-lbl how the workflows stuff is progressing
This needs to go in the next sprint for the data processing.
@JamesTessmer Would you pleae issue an changesheet for the omics records you made to change 'omics_type.has_raw_value':'metagenome' to 'omics_type.has_raw_value':'Metagenome'
@aclum I'm working on the change sheet, but I've run into some formatting issues I think. When I try to validate the change sheet I get an error saying "'Action' column is missing" but it's there in the file. It won't let me attach a TSV file here so I pasted the first 3 lines, and it seems like it might just be a formatting issue I can't figure out. All the content in the file should be correct though. Sorry if this is some easy error to fix, I haven't used the change sheets before.
id action attribute value
nmdc:omprc-11-5r7kbf66 update omics_type.has_raw_value Metagenome
nmdc:omprc-11-46tjy763 update omics_type.has_raw_value Metagenome
@aclum I'm working on the change sheet, but I've run into some formatting issues I think. When I try to validate the change sheet I get an error saying "'Action' column is missing" but it's there in the file. It won't let me attach a TSV file here so I pasted the first 3 lines, and it seems like it might just be a formatting issue I can't figure out. All the content in the file should be correct though. Sorry if this is some easy error to fix, I haven't used the change sheets before.
id action attribute value nmdc:omprc-11-5r7kbf66 update omics_type.has_raw_value Metagenome nmdc:omprc-11-46tjy763 update omics_type.has_raw_value Metagenome
Ah, of course shortly after posting this I got it figured out. There was a space pretending to be a tab. The change sheet passes validation now.
Genewiz generated data is being processed now.
@scanon just wanted to make a quick note here that the JGI data objects do not have paths on the "url" slot asserted on them yet. Not sure if that information is required by the nmdc_automation
code?
If it is, I can get the paths/URLs from @mflynn-lanl and add that to the data objects. If not, we should still add them, looking at it from a "metadata enrichment" lens.
@sujaypatil96 we should talk about this more, JGI requires login for download so i'm not sure we want to use the same slot @turbomam
The JGI samples are also stored on tape and they need to be restored before they can be downloaded
Actively in progress. All genewiz data has been processed through annotation. Binning still needs to be run. Ingest of JGI data is still outstanding. Moving to the next sprint.
backlogging this until re-iding is done.
1000 soils has metaG data generated from JGI and from Genewiz
16 : # of samples run at JGI : See https://github.com/microbiomedata/issues/issues/85
?? : # of samples run at Genewiz
running binning and MAGs for JGI results on JGI assemblies is blocked on https://github.com/microbiomedata/nmdc_automation/issues/35