microbiomedata / nmdc-schema

National Microbiome Data Collaborative (NMDC) unified data model
https://microbiomedata.github.io/nmdc-schema/
Creative Commons Zero v1.0 Universal
26 stars 8 forks source link

Re-ID, Ingest to Napa DB, and Verify Napa compliance for "DeepShale" study: `nmdc:sty-11-8fb6t785` #1835

Closed mbthornton-lbl closed 3 months ago

mbthornton-lbl commented 4 months ago

Note: Scope of this work is the Napa Database Instance. The same steps will need to be repeated in a prod-ready environment

For the "DeepShale" Study - id: nmdc:sty-11-8fb6t785 legacy id: gold:Gs0114675

mbthornton-lbl commented 3 months ago

linkml-validate vs. schema 8.0.0 passed:

INFO:root:Using SchemaView with im=None No issues found

mbthornton-lbl commented 3 months ago

linkml-validate vs. schema 10.0.1 finds:

INFO:root:Using SchemaView with im=None [ERROR] [./local/nmdc:sty-11-8fb6t785.yaml/0] Additional properties are not allowed ('award_dois' was unexpected) in /study_set/0 [ERROR] [./local/nmdc:sty-11-8fb6t785.yaml/0] 'study_category' is a required property in /study_set/0

Both of these will have already been addressed in Prod by the v10 data migration process, so we can consider these as a passing result

mbthornton-lbl commented 3 months ago

linkml-convert to .ttl file and uploaded to GraphDB.

PREFIX nmdc: <https://w3id.org/nmdc/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
# Orphan DataObjects - not object of has_input or has_output
select * where { 
    ?do a nmdc:DataObject .
    minus {
        ?o nmdc:has_input ?do .
    }
    minus {
        ?o nmdc:has_output ?do .
    }
} limit 100 

Returns no results

aclum commented 3 months ago

@mbthornton-lbl can you rerun this re-iding. The data objects were removed from prod in December of 2023 (ie I can find {'doc.id':'nmdc:76a9fb6a1d29da495d246728ab7ace33'} in nmdc_deleted but that ID is still in mongo napa.

aclum commented 3 months ago

Straggler data objects that need to be deleted, this one is from a viral draft. it likely got added by accident and then didn't get cleaned up properly data_object_set id jgi:581b0d737ded5e427b7b6a4f