microbiomedata / nmdc-schema

National Microbiome Data Collaborative (NMDC) unified data model
https://microbiomedata.github.io/nmdc-schema/
Creative Commons Zero v1.0 Universal
26 stars 8 forks source link

Re-ID, Ingest to Napa DB, and Verify Napa compliance for "Populus" study `nmdc:sty-11-1t150432` #1836

Closed mbthornton-lbl closed 3 months ago

mbthornton-lbl commented 4 months ago

Note: Scope of this work is the Napa Database Instance. The same steps will need to be repeated in a prod-ready environment

For the "Populus" Study - id: nmdc:sty-11-1t150432 legacy id: gold:Gs0103573

mbthornton-lbl commented 3 months ago

linkml-validate vs. 8.0 schema:

INFO:root:Using SchemaView with im=None No issues found

mbthornton-lbl commented 3 months ago

linkml-validate vs. schema 10.0.1 finds:

INFO:root:Using SchemaView with im=None [ERROR] [./local/nmdc:sty-11-1t150432.yaml/0] Additional properties are not allowed ('award_dois', 'publication_dois' were unexpected) in /study_set/0 [ERROR] [./local/nmdc:sty-11-1t50432.yaml/0] 'study_category' is a required property in /study_set/0

Both of these are handled by version 10 data migrations, so we can consider this a passing result

mbthornton-lbl commented 3 months ago

Created .ttl file and loaded into GraphDB and ran the following query

PREFIX nmdc: <https://w3id.org/nmdc/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
# Orphan DataObjects - not object of has_input or has_output
select * where { 
    ?do a nmdc:DataObject .
    minus {
        ?o nmdc:has_input ?do .
    }
    minus {
        ?o nmdc:has_output ?do .
    }
} limit 100 

returns no results

aclum commented 3 months ago

There are still annotation files based on the GOLD names, in prod these are not referenced by has_output so need to be deleted. ie nmdc:3d15cdae77378493e5deb480c46bea38

aclum commented 3 months ago

The same is true for reads based analysis.

aclum commented 3 months ago

https://github.com/microbiomedata/nmdc_automation/issues/79 should address the issues.