Closed mbthornton-lbl closed 2 months ago
SPARQL query for Orphaned DataObjects:
PREFIX nmdc: <https://w3id.org/nmdc/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
# Orphan DataObjects - not object of has_input or has_output
select * where {
?do a nmdc:DataObject .
minus {
?o nmdc:has_input ?do .
}
minus {
?o nmdc:has_output ?do .
}
} limit 100
I reopened SPRUCE but this likely pertains to all the studies. We are missing deleting some of the binning data object records. ie
{'description':{$regex:/Gp0208377/}}
from the SPRUCE example data object type are Metagenome Bins
,CheckM Statistics
or null. The null ones, based on this example, could be captured by a case insensitive search for metabat2
on slot description
@aclum Are we deleting all Binning data objects, or only those with a non-compliant ID? Should records like this one:
record: nmdc:dobj-11-qm3fbt63 CheckM Statistics CheckM for nmdc:wfmag-11-m0t5hc17.1
be deleted?
NO only non-compliant identifiers
@mbthornton-lbl will be continuing to work on this in the next sprint per Slack message. Moving over.
Test Runs on Napa instance and pre-requisites for Metagenomic workflows. Note: Context for this test run assumes the the following updates have been applied to the testDB instance:
https://github.com/microbiomedata/issues/issues/532
Re-ID Studies:
Pre-Requisites - complete ETL on Napa instance and verify:
Pre-requisites - All ETL recipes fully reproducable:
Study, BioSample and Omics:
Metagenomics: