microbiomedata / nmdc-schema

National Microbiome Data Collaborative (NMDC) unified data model
https://microbiomedata.github.io/nmdc-schema/
Creative Commons Zero v1.0 Universal
27 stars 8 forks source link

Re-ID, Ingest to Napa DB, and Verify Napa compliance for "CrestedButte" study `nmdc:sty-11-dcqce727` #1794

Closed mbthornton-lbl closed 3 months ago

mbthornton-lbl commented 5 months ago

Note: Scope of this work is the Napa Database Instance. The same steps will need to be repeated in a prod-ready environment

For the "CrestedButte" Study - id: nmdc:sty-11-dcqce727 legacy id: gold:Gs0135149 jgi proposal id: 503568

mbthornton-lbl commented 5 months ago

linkml-validate: No issues found

linkml-convert: successful

Imported to GraphDB

SPARQL query:

PREFIX nmdc: <https://w3id.org/nmdc/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
# Orphan DataObjects - not object of has_input or has_output
select * where { 
    ?do a nmdc:DataObject .
    minus {
        ?o nmdc:has_input ?do .
    }
    minus {
        ?o nmdc:has_output ?do .
    }
} limit 100 

Returns to results

mbthornton-lbl commented 4 months ago

Schema v10 compatibility issues will be addressed by: https://github.com/microbiomedata/nmdc_automation/issues/66

ssarrafan commented 4 months ago

@mbthornton-lbl should this re-opened issue get moved to the next sprint?

aclum commented 4 months ago

Re-ID logic is not deleting multiple versions, please rerun. For example there are still legacy data objects for gold:Gp0321263 even though there is a properly re-ided record. {'description':{$regex:/Gp0321263/}} on data_object_set returns 14 records.

aclum commented 4 months ago

This is the study where the NOM omics records still need to be deleted by Yuri.

aclum commented 4 months ago

db.getCollection(

'omics_processing_set' ).aggregate( [ { $match: { 'omics_type.has_raw_value': 'Organic Matter Characterization' } }, { $lookup: { from: 'data_object_set', localField: 'has_output', foreignField: 'id', as: 'output_do' } }, { $match: { output_do: { $exists: true, $size: 0 } } }, { $group: { _id: '$part_of', count: { $sum: 1 } } } ], { maxTimeMS: 60000, allowDiskUse: true } );

mbthornton-lbl commented 3 months ago

Re-ranomics_processing_has_output_data_objects : 0 results

mbthornton-lbl commented 3 months ago

Resolved by #1894