hubmapconsortium / ccf-validation-tools

HRA ASCT+B Validation Reports
https://hubmapconsortium.github.io/ccf-validation-tools/
Apache License 2.0
6 stars 1 forks source link

Generate reports for IU of current CL & Uberon content #14

Closed dosumis closed 1 year ago

dosumis commented 3 years ago

Query UberGraph to:

For all queries, return labels and IDs of all entities

Find all Uberon: CL part_of links

?a part_of ?c

? a must have IRI startWith http://purl.obolibrary.org/obo/UBERON_ ? c must have IRI startWith http://purl.obolibrary.org/obo/CL_

DIRECT LINKS:

SELECT ?uberon ?cl 
WHERE {
    ?uberon part_of: ?cl .
  FILTER (strstarts(str(?uberon), "http://purl.obolibrary.org/obo/UBERON_"))
  FILTER (strstarts(str(?cl), "http://purl.obolibrary.org/obo/CL_"))
}

Add indirects by find all subclasses of ?cl - part_of relationship will apply to these too. See https://api.triplydb.com/s/YWf3-Y6YK

Find all Uberon:CL part_of links where Uberon term has an FMA:xref

?a partof ?c . ?a oio:hasDbXref ?xref . ? xref startsWith "FMA:" ? a must have IRI startWith http://purl.obolibrary.org/obo/UBERON ? c must have IRI startWith http://purl.obolibrary.org/obo/CL_

Find all Uberon:Uberon part_of links where both have an FMA:xref

?a1 part_of ?a2. ?a1 oio:hasDbXref ?xref1 . ?a2 oio:hasDbXref ?xref2 . ? xref1 startsWith "FMA:" ? xref2 startsWith "FMA:"

? a must have IRI startWith http://purl.obolibrary.org/obo/UBERON_ ? c must have IRI startWith http://purl.obolibrary.org/obo/UBERON_

ENDPOINT = https://stars-app.renci.org/uberongraph/sparql

Example query: https://api.triplydb.com/s/Cz48kXEDg

oio = http://www.geneontology.org/formats/oboInOwl#

dosumis commented 3 years ago

It looks like the graph that has the part_of links in lacks the FMA xrefs.

SELECT (count( distinct ?uberon1) as ?fmac)
FROM <http://reasoner.renci.org/redundant>
WHERE {
    ?uberon1 oio:hasDbXref ?xref1 . 
    FILTER (strstarts(str(?uberon1), "http://purl.obolibrary.org/obo/UBERON_"))
    FILTER (strstarts(str(?xref1), "FMA:"))
}

=> Zero hits, but without FROM clause => 5926 hits

https://api.triplydb.com/s/RDNSMPWKY

@anitacaron - can you try a nested query that selects from different graphs (if that makes sense) Beyond may SPARQL abilities.

anitacaron commented 3 years ago

Yes, it's possible. You only need to include a list of graphs.

SELECT (count( distinct ?uberon1) as ?fmac)
FROM <http://reasoner.renci.org/redundant>
FROM <graph2>
FROM <graph3>
WHERE (...)
dosumis commented 3 years ago

I misunderstood the nature of cross graph queries.

This works pretty well

SELECT DISTINCT ?uberon1 ?uberon2 ?u1l ?u2l
FROM <http://reasoner.renci.org/nonredundant>
FROM <http://reasoner.renci.org/ontology>
WHERE {
    ?uberon1 part_of: ?uberon2 .
    ?uberon1 rdfs:label ?u1l .
    ?uberon2 rdfs:label ?u2l .
    ?uberon1 oio:hasDbXref ?xref1 . 
    ?uberon2 oio:hasDbXref ?xref2 . 
    FILTER (strstarts(str(?uberon1), "http://purl.obolibrary.org/obo/UBERON_"))
    FILTER (strstarts(str(?uberon2), "http://purl.obolibrary.org/obo/UBERON_"))
    FILTER (strstarts(str(?xref1), "FMA:"))
    FILTER (strstarts(str(?xref2), "FMA:")) 
} 

https://api.triplydb.com/s/UD16H3gIC

Not sure why this didn't work previously. This is still rather too much though. The "nonredundant" graph seems to include rather a lot of redundancy. What I'd really for this report is just inferences via the property graph and property chains.

We should be able to achieve this, minus the property chain reasoning, by querying http://reasoner.renci.org/ontology across existential restrictions on part_of instead of materialised triples and including a clause that finds subproperties of part_of. @anitacaron - can you give that a go?

balhoff commented 3 years ago

@dosumis a much better nonredundant graph is coming soon.

anitacaron commented 3 years ago

If we don't use from where we want to select the data, it will select from the default graph.

We could have the property chain just including a + after the property if I understood well.

I'll change the query to have the sub-properties and existential part_of.

dosumis commented 3 years ago

We could have the property chain just including a + after the property if I understood well.

That would be for transitivity. Primer on property chains here https://www.w3.org/TR/owl2-primer/#Property_Chains

anitacaron commented 3 years ago

Would it be something like this you're expecting, @dosumis? https://api.triplydb.com/s/dZnVI_klU

balhoff commented 3 years ago

@dosumis @anitacaron I just deployed an update to Ubergraph. The nonredundant graph is much better now (I'm using a new logical filtering procedure on the full closure). But it is important to keep in mind which parts of your SPARQL query you want to make use of logical entailments vs. which are useful output. For example, if you only use part_of with the nonredundant graph, you miss cases where there is a more specific relation, like skeleton_of. The redundant graph has both relations between the terms. But if you are outputting the relations for that example subject term, you probably only want to see skeleton_of rather than part_of. But most of the time I think you will have much more logically complete results using combinations of the redundant and nonredundant graphs rather than querying into OWL axiom structures. You can use GRAPH blocks to match certain triple patterns only in one or the other.

Please let me know if you have any feedback on how the nonredundant graph looks. Note it includes both existential relations and rdfs:subClassOf.

anitacaron commented 3 years ago

@dosumis, here is an updated version for the first query: Find all Uberon: CL part_of links. https://api.triplydb.com/s/_ZAu4xgig

This query includes the sub-properties of part_of, but I'm unsure why it doesn't show results when I concatenate the two columns. So I join the results of each column.