1000 soil - process metaG data: JGI & Genewiz

mslarae13 commented 9 months ago

1000 soils has metaG data generated from JGI and from Genewiz

16 : # of samples run at JGI : See https://github.com/microbiomedata/issues/issues/85

[x] Map JGI samples to NMDC biosamples
[x] Update NMDC metadata or JGI depending on which is right

?? : # of samples run at Genewiz

[x] Get data on NERSC to process (in progress)
[x] Map data files to NMDC biosamples (Bea, in Pprogress)
[ ] https://github.com/microbiomedata/issues/issues/632
[ ] Process All data

running binning and MAGs for JGI results on JGI assemblies is blocked on https://github.com/microbiomedata/nmdc_automation/issues/35

ssarrafan commented 8 months ago

Appears to be in progress and active. I'll move this to the next sprint @bmeluch @mslarae13

bmeluch commented 8 months ago

@JamesTessmer @sujaypatil96 here is the issue for tracking work on the 1000 soils metagenomics OmicsProcessing records and getting the data into workflows, etc. Thanks so much for working on this!

aclum commented 8 months ago

@bmeluch @lamccue Do we know anything about the instrument model for the data from Genewiz? Its not required by the schema but it is a data portal faceted search option so would be nice to have this for the omics records.

lamccue commented 8 months ago

I'm not entirely sure.
Illumina definitely. More specifically: maybe Illumina NovaSeq

mslarae13 commented 8 months ago

Discussed in a quick meeting

"omics_type": {
        "has_raw_value": "Metagenome"

"part_of": ["nmdc:sty-11-28tm5d36"]

"type": "nmdc:OmicsProcessing"
name == biosample name (edited) 

"instrument_name": "Illumina NovaSeq S4"

"principal_investigator": {
        "email": "nancy.hess@pnnl.gov",
        "has_raw_value": "Nancy Hess",
        "name": "Nancy Hess"

name == biosample name

id: nmdc:omprc-11-123
has_input:
 - nmdc:bsm-11-yj9yav68
has_output:
 - nmdc:dobj-11-123
 - nmdc:dobj-11-124

mslarae13 commented 8 months ago

processing_institution will be left empty until this issue us resolved: https://github.com/microbiomedata/issues/issues/634

mslarae13 commented 8 months ago

@sujaypatil96 & @JamesTessmer have created the -omics processing records. Workflows should be ready to be run for both the JGI and Genewiz (Azenta) data.

@aclum , @mbthornton-lbl do you need anything else?

aclum commented 8 months ago

No we should be good to go. Michael has already started on getting the JGI data staged.

ssarrafan commented 8 months ago

@mslarae13 @bmeluch can this issue be closed based on Alicia's last comment?

bmeluch commented 8 months ago

I'm not sure, would have to ask @aclum or @mbthornton-lbl how the workflows stuff is progressing

aclum commented 8 months ago

This needs to go in the next sprint for the data processing.

aclum commented 7 months ago

@JamesTessmer Would you pleae issue an changesheet for the omics records you made to change 'omics_type.has_raw_value':'metagenome' to 'omics_type.has_raw_value':'Metagenome'

JamesTessmer commented 7 months ago

@aclum I'm working on the change sheet, but I've run into some formatting issues I think. When I try to validate the change sheet I get an error saying "'Action' column is missing" but it's there in the file. It won't let me attach a TSV file here so I pasted the first 3 lines, and it seems like it might just be a formatting issue I can't figure out. All the content in the file should be correct though. Sorry if this is some easy error to fix, I haven't used the change sheets before.

id  action  attribute   value
nmdc:omprc-11-5r7kbf66    update    omics_type.has_raw_value    Metagenome
nmdc:omprc-11-46tjy763    update    omics_type.has_raw_value    Metagenome

JamesTessmer commented 7 months ago

@aclum I'm working on the change sheet, but I've run into some formatting issues I think. When I try to validate the change sheet I get an error saying "'Action' column is missing" but it's there in the file. It won't let me attach a TSV file here so I pasted the first 3 lines, and it seems like it might just be a formatting issue I can't figure out. All the content in the file should be correct though. Sorry if this is some easy error to fix, I haven't used the change sheets before.
id    action  attribute   value
nmdc:omprc-11-5r7kbf66    update    omics_type.has_raw_value    Metagenome
nmdc:omprc-11-46tjy763    update    omics_type.has_raw_value    Metagenome

Ah, of course shortly after posting this I got it figured out. There was a space pretending to be a tab. The change sheet passes validation now.

scanon commented 7 months ago

Genewiz generated data is being processed now.

sujaypatil96 commented 7 months ago

@scanon just wanted to make a quick note here that the JGI data objects do not have paths on the "url" slot asserted on them yet. Not sure if that information is required by the nmdc_automation code?

sujaypatil96 commented 7 months ago

If it is, I can get the paths/URLs from @mflynn-lanl and add that to the data objects. If not, we should still add them, looking at it from a "metadata enrichment" lens.

aclum commented 7 months ago

@sujaypatil96 we should talk about this more, JGI requires login for download so i'm not sure we want to use the same slot @turbomam

mflynn-lanl commented 7 months ago

The JGI samples are also stored on tape and they need to be restored before they can be downloaded

aclum commented 7 months ago

Actively in progress. All genewiz data has been processed through annotation. Binning still needs to be run. Ingest of JGI data is still outstanding. Moving to the next sprint.

aclum commented 6 months ago

backlogging this until re-iding is done.

microbiomedata / issues

1000 soil - process metaG data: JGI & Genewiz #613