monarch-initiative / dipper

Data Ingestion Pipeline for Monarch
https://dipper.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
56 stars 26 forks source link

mouse strains incorrectly assigned to taxon 'sequence alteration' (SO:0001059) #906

Closed kshefchek closed 4 years ago

kshefchek commented 4 years ago

In neo4j:

match path=(subject)-[r:RO:0002162]->(object:Node{iri:'SO:0001059'}) return r.isDefinedBy, count(path)

+------------------------------------------------------------+ | r.isDefinedBy | count(path) | +------------------------------------------------------------+ | "https://archive.monarchinitiative.org/#mgi" | 390 | +------------------------------------------------------------+ 1 row

IRIs and labels: https://scigraph-data-dev.monarchinitiative.org/scigraph/cypher/execute?cypherQuery=match%20path%3D(subject)-%5Br%3ARO%3A0002162%5D-%3E(object%3ANode%7Biri%3A'SO%3A0001059'%7D)%20return%20subject.iri%2Csubject.label&limit=400

kshefchek commented 4 years ago

A weird one: | "http://www.informatics.jax.org/accession/MGI:4867032" | "Not Specified"

MGI page returns " No accession id found. Please verify that your request contains an id parameter. "

kshefchek commented 4 years ago

Thinking this is the issue: https://github.com/monarch-initiative/dipper/blob/master/translationtable/mgi.yaml#L115 https://github.com/monarch-initiative/dipper/blob/master/translationtable/mgi.yaml#L119 https://github.com/monarch-initiative/dipper/blob/master/translationtable/mgi.yaml#L123

I'm not sure if we can globally assign "Other", "Unspecified", "Not Applicable" for every table in MGI, I suspect the defaults will be table dependent (in these cases I'd propose 'Mus' for taxon)