microbiomedata / issues

public repo for issues related to NMDC work
1 stars 0 forks source link

re-id sequencing omics records with missing data objects #596

Closed aclum closed 3 months ago

aclum commented 6 months ago

Deliverable this task is associated with

See Deliverables tab here:

RACI

Tag people in their roles

Describe the the task ~- [ ] We'll need to make new data objects for these three missing data objects. For nmdc:8a9d164e1310e5b838d6ceb492f64a61 the other two data objects exist (for omics processing ID nmdc:omprc-11-tdt0js09, formerly gold:Gp0452741)~

Criteria for completion

Estimate people time

Completion Date (Goal)

Target Sprint Start & End Dates

Tag Blocker/Contingent upon issues

aclum commented 6 months ago

20240129.missing_has_output.nmdc.omics_processing_set.json

aclum commented 6 months ago

Two data objects for gold:Gp0452741 that do exist { "_id": { "$oid": "649b00471ae706d7b5b1c2e2" }, "id": "nmdc:9bd3cf378610c02776b54cc797d8c07a", "name": "SAMEA7723902_ERR5003681_interleaved.fq.gz", "description": "Raw interleaved fastq for SAMEA7723902_ERR5003681 (gold:Gp0452741)", "md5_checksum": "9bd3cf378610c02776b54cc797d8c07a", "url": "https://data.microbiomedata.org/data/raw/SAMEA7723902_ERR5003681_interleaved.fq.gz ", "file_size_bytes": 114912166 }

{ "_id": { "$oid": "649b00471ae706d7b5b1c2ed" }, "id": "nmdc:9d5f99fba241d6bdd933ccbf405bf872", "name": "SAMEA7723902_ERR5004468_interleaved.fq.gz", "description": "Raw interleaved fastq for SAMEA7723902_ERR5004468 (gold:Gp0452741)", "md5_checksum": "9d5f99fba241d6bdd933ccbf405bf872", "url": "https://data.microbiomedata.org/data/raw/SAMEA7723902_ERR5004468_interleaved.fq.gz ", "file_size_bytes": 125218381 } There are two missing sra run ids from this grouping ERR5003109 and ERR5001830

unclear what actually happened during analysis. Workflow activities only show one data object has has_input ( nmdc:9bd3cf378610c02776b54cc797d8c07a. cc @Michal-Babins @scanon @hubin-keio Could the workflow code only handle 1 fastq when these ran (EMP500 sample)?

mbthornton-lbl commented 3 months ago

Resolved by https://github.com/microbiomedata/nmdc-schema/issues/1894