openphacts / OPS_LinkedDataApi

A repository to host API configuration files, and code extensions
Other
9 stars 7 forks source link

Missing Interactions from GetInteractions call #20

Open RyAMiller opened 8 years ago

RyAMiller commented 8 years ago

Issue with the GetInteractions api call for pathways.
When testing, Peter noticed that especially on Reactome pathways only the metabolites were being returned and not the Reactome complex IDs. As a result, the GetInteractions call was only returning interactions that were connected to the metabolites in the pathway tested. pathway used - Pathway being tested (WP2704)

The data loaded should be the same that is on the WP SPARQL endpoint. It is probably not a data problem. Looking at the documentation for the call, I think the issue is with the SPARQL query used. I am hoping it is as simple as fixing the query.

For reference I have included a query that works on the WP endpoint and I think solves the above issue. Any feedback is appreciated.

http://sparql.wikipathways.org/

PREFIX wp:    <http://vocabularies.wikipathways.org/wp#>
PREFIX dcterms: <http://purl.org/dc/terms/>

SELECT DISTINCT ?pathway ?interaction ?target ?source 
WHERE {

   ?pathway a wp:Pathway . 
   ?pathway dc:identifier <http://identifiers.org/wikipathways/WP2704> .

   ?interaction dcterms:isPartOf ?pathway .
   ?interaction a wp:Interaction .

   ?interaction wp:target ?target .
   ?interaction wp:source ?source .
}
stain commented 8 years ago

I can confirm that I get the same results from your query on our SPARQL endpoint , so the data is loaded.

Could you help me understand this a bit more..

https://ops2.few.vu.nl/2.1/pathway/getInteractions?uri=http%3A%2F%2Fidentifiers.org%2Fwikipathways%2FWP2704&app_id=161aeb7d&app_key=333c09ae195d777b68a117bb42f29b1c&_format=ttl

returns just three interactions:

<http://rdf.wikipathways.org/Pathway/WP2704_r81439/WP/Interaction/b0e38> rdf:type ns0:DirectedInteraction ;
  void:inDataset <http://www.wikipathways.org> ;
  ns0:source <http://identifiers.org/chebi/CHEBI:15422> ;
  ns0:target <http://identifiers.org/chebi/CHEBI:16761> .

<http://rdf.wikipathways.org/Pathway/WP2704_r81439/WP/Interaction/f460a> rdf:type ns0:DirectedInteraction ;
   void:inDataset <http://www.wikipathways.org> ;
   ns0:source <http://identifiers.org/chebi/CHEBI:15422> ;
   ns0:target <http://identifiers.org/chebi/CHEBI:16761> .

<http://rdf.wikipathways.org/Pathway/WP2704_r81439/WP/Interaction/f6b8d> rdf:type ns0:DirectedInteraction ;
  void:inDataset <http://www.wikipathways.org> ;
  ns0:source <http://identifiers.org/chebi/CHEBI:15422> ;
  ns0:target <http://identifiers.org/chebi/CHEBI:16761> .

but you would have hoped for additional interactions with other sources/targets, as found in sparql.wikipathways.org ?

e.g. you would want also:

<http://rdf.wikipathways.org/Pathway/WP2704_r81439/WP/Interaction/d8c8a> rdf:type ns0:DirectedInteraction ;
  void:inDataset <http://www.wikipathways.org> ;
  ns0:source <http://identifiers.org/uniprot/P40189-2> ;
  ns0:target <http://identifiers.org/reactome/R-HSA-1067674> .

etc?

I think the API's query filters out all the http://identifiers.org/reactome/* identifiers -- it seems they are generally duplicates on the ?source and ?target - or do the interactions have multiple sources and multiple targets? This is a bit confusing to me.

stain commented 8 years ago

BTW - all of the reactome identifiers fail in the browser - e.g. http://identifiers.org/reactome/R-HSA-1067691 says:

The data in the URL can't fit into a state

PeterWoollard commented 8 years ago

Yes Ryan has flagged that the REACTOME identifiers fail. What is the action? Flag this to the REACTOME team at EBI and Toronto to ask them to fix this?

PeterWoollard commented 8 years ago

In many pathways, the "entity" is actually a complex, e.g. several proteins, metal ions, small molecule ligands (e.g. short peptides or simple ), ATP etc.. In the REACTOME they represent this has using sets(=lists), and recursive subsets. In pathways we typically simplify this for our overloaded brains as binary interactions. In simple terms if complex A directly causes the phosphorylation of proteinB, then we are simplifying it with all the proteins in ComplexA(the source) as directly interacting with proteinB(the target). Does this help or confuse you further? Ryan, Chris or I can explain more.

PeterWoollard commented 8 years ago

I believe there are missing linksets with the REACTOME data. When querying the wikipathways sparql endpoint directly for interactions, the REACTOME is missing proteins, just getting metabolites.

RyAMiller commented 8 years ago

Yes, Stian. We can have multiple sources and targets. This is correct behavior. For example...

A           C
 \          ^
   \      /
    ----
   /      \
  /         ∨
B           D

Something like this is possible and shows up quite often in reactome pathways.

RyAMiller commented 8 years ago

Peter, I am not sure which data you are missing. The interactions are connected from say complex A to complex B using an interaction and you are right. You have to examine what complex A and complex B are individually since like you said we simplify things to say one group as a whole is affecting another group as a whole.

RyAMiller commented 8 years ago

As far as the REACTOME IDs, I do think we need to raise this as an issue. Because the ID is valid and it is the right entity, but for some reason in the pathway viewer, it does not work.
For example, the one that Stian gave earlier R-HSA-1067691 using identifiers.org resolves to http://www.reactome.org/PathwayBrowser/#R-HSA-1067691 this should be the correct pattern, but interestingly enough, it is not. I am not sure why this is happening.

http://www.reactome.org/content/detail/R-HSA-1067691 does work though, but it is not in the context of the pathwaybrowser. I think we want it in the context of the pathway browser, correct? Because if we use the 'content/detail/ID' link, then it is a real pain to then get to the correct place in the pathwayviewer. (you have to expand the pathway viewer section, go the last link and then dig around the patwhay to find the right ID). I think this is an issue for the Reactome team to address.

stain commented 8 years ago

So as far as I understand we don't know anything more about the reactome in the wikipathways RDF beyond what pathways and interaction it is part of - is there another RDF source we need to load?

You mentioned we need additional linksets? What is the source and destination of the linkset?

RyAMiller commented 8 years ago

Peter mentioned this and I am not sure I am following. I will try and ask him.

Chris-Evelo commented 8 years ago

Two separate things in this thread I think.

1) Reactome content doesn't show in the browser but URL is correct, yes should be discussed with Reactome as suggested by Peter. Ryan, can you do that?

2) Sometimes (for Reactome often) interactions involve complexes. That is fine, but there should be a separate call to get the content of a complex (and the reverse get all complexes a gene product participates in) . That should probably be part of the API calls that we have for the same issue about ChEBI complexes. Do we have all the content needed to make that call work?