monarch-initiative / dipper

Data Ingestion Pipeline for Monarch
https://dipper.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
57 stars 26 forks source link

BNODE explicit `anonymous=true` predicate removal #480

Open TomConlin opened 7 years ago

TomConlin commented 7 years ago

In a tiny minority of files we redundantly annotate blank nodes as anonymous using the only non-ontologically derived predicate in all of Monarch.

With the overwhelming majority of blank nodes unannotated and working it is likely we realized we should derive the annotation from the structure the bnode where necessary and thus safe to remove these vestigial annotations.

  grep anonymous dot/*.gv
    dot/ctd.gv:BNODE -> LITERAL [label="MONARCH:MONARCH_anonymous (25702)"];
    dot/kegg.gv:BNODE -> LITERAL [label="MONARCH:MONARCH_anonymous (9463)"];
    dot/kegg_test.gv:BNODE -> LITERAL [label="MONARCH:MONARCH_anonymous (382)"];
    dot/omia.gv:BNODE -> LITERAL [label="MONARCH:MONARCH_anonymous (716)"];
    dot/omim.gv:BNODE -> LITERAL [label="MONARCH:MONARCH_anonymous (5630)"];
    dot/omim_test.gv:BNODE -> LITERAL [label="MONARCH:MONARCH_anonymous (5)"];
    dot/orphanet.gv:BNODE -> LITERAL [label="MONARCH:MONARCH_anonymous (6546)"];
    dot/wormbase.gv:BNODE -> LITERAL [label="MONARCH:MONARCH_anonymous (1470)"];
    dot/wormbase_test.gv:BNODE -> LITERAL [label="MONARCH:MONARCH_anonymous (1)"];
kshefchek commented 7 years ago

Can we determine where we're not applying this annotation? This indicates we're not consistently processing blank nodes - unless I'm misremembering how it works.

TomConlin commented 7 years ago
> Can we determine where we're not applying this annotation?  

every file not listed above which contains a blank node (i.e. everything else)
I do not want to continue using the MONARCH:MONARCH_anonymous predicate in any case. Monarch's predicates should ALL be resolvable ontological terms found in an OWL files.

I would hope if neo4j required blank nodes to be annotated as anonymous we would do that uniformly on scigraph ingest based on recognizing blank nodes based on curie prefix _: or skolem IRI

kshefchek commented 7 years ago

IMHO, we need to remove all uses of the anonymous designation upstream before clearing it out:

https://github.com/monarch-initiative/monarch-cypher-queries/search?utf8=%E2%9C%93&q=anonymous

Other options:

kshefchek commented 7 years ago

Going back to my PR on this - it looks like I intentionally only annotated variant blank nodes: https://github.com/monarch-initiative/dipper/pull/424