openphacts / GLOBAL

Global project issues [private for now. owner lee harland]
3 stars 0 forks source link

Different interaction counts for different URIs of the same compound #376

Open danidi opened 7 years ago

danidi commented 7 years ago

Does the /pathways/interactions/byEntity/count API call use the IMS? The two examples mentioned here are connected in the IMS, but retrieve different counts.

egonw commented 7 years ago

"You spying Basterds"? :)

egonw commented 7 years ago

@danidi yeah, I'm thinking in that direction too... I will explore this before the next MSCPiLS meeting this Thursday...

Oh, BTW, I check the map/ function in the API, and there both are given as "equivalent"... but, yes, I think too it must have to do with lenses not correctly used or so...

danidi commented 7 years ago

I guess one of your students ;) Thank you for looking into it!

randykerber commented 7 years ago

7 months ago the counts were 163 and 279. Now they are 326 and 489.

For the first query the set of URIs inserted into the SPARQL query is:

 <http://www.hmdb.ca/metabolites/HMDB01206>
 <https://www.surechembl.org/chemical/SCHEMBL6086>
 <http://info.identifiers.org/hmdb/HMDB01206>
 <http://www.chemspider.com/Chemical-Structure.392413>
 <http://bio2rdf.org/chebi:15351>
 <http://rdf.ebi.ac.uk/resource/surechembl/molecule/SCHEMBL6086>
 <http://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:15351>
 <http://www.conceptwiki.org/web-ws/concept/get?uuid=25a6ca47-0769-408d-ad02-75b8c06afd61>
 <http://ops.rsc-us.org/OPS1769651>
 <http://ops.rsc.org/Compounds/Get/1769651>
 <http://www.chemspider.com/392413>
 <http://www.conceptwiki.org/concept/25a6ca47-0769-408d-ad02-75b8c06afd61>
 <http://www.chemspider.com/Chemical-Structure.392413.html>
 <http://ops.rsc.org/OPS1769651>
 <http://www.ebi.ac.uk/ontology-lookup/?termId=CHEBI:15351>
 <http://ops.rsc.org/OPS1769651/rdf>
 <http://info.identifiers.org/chebi/CHEBI:15351>
 <http://purl.obolibrary.org/obo/CHEBI_15351>
 <http://www.chemspider.com/Chemical-Structure.392413.rdf>
 <http://info.identifiers.org/chemspider/392413>
 <http://identifiers.org/obo.chebi/CHEBI:15351>
 <http://purl.org/obo/owl/CHEBI#CHEBI_15351>
 <http://identifiers.org/hmdb/HMDB01206>
 <http://identifiers.org/chemspider/392413>
 <http://purl.bioontology.org/ontology/CHEBI/CHEBI:15351>
 <http://rdf.chemspider.com/392413>
 <http://www.conceptwiki.org/concept/index/25a6ca47-0769-408d-ad02-75b8c06afd61>
 <http://identifiers.org/chebi/CHEBI:15351>

For the second query the list of URIs is:

<http://identifiers.org/wikipedia.en/Acetyl-CoA>
 <http://purl.bioontology.org/ontology/CHEBI/CHEBI:15351>
 <http://www.chemspider.com/Chemical-Structure.392413.rdf>
 <http://purl.obolibrary.org/obo/CHEBI_15351>
 <http://purl.org/obo/owl/CHEBI#CHEBI_15351>
 <http://dbpedia.org/page/Acetyl-CoA>
 <http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=444493>
 <http://dbpedia.org/resource/Acetyl-CoA>
 <http://www.chemspider.com/Chemical-Structure.392413.html>
 <http://rdf.ncbi.nlm.nih.gov/pubchem/compound/444493>
 <http://info.identifiers.org/kegg.compound/C00024>
 <http://identifiers.org/chemspider/392413>
 <http://info.identifiers.org/pubchem.compound/444493>
 <http://www.chemspider.com/Chemical-Structure.392413>
 <https://www.surechembl.org/chemical/SCHEMBL6086>
 <http://www.genome.jp/dbget-bin/www_bget?cpd:C00024>
 <http://identifiers.org/cas/72-89-9>
 <http://info.identifiers.org/hmdb/HMDB01206>
 <http://identifiers.org/obo.chebi/CHEBI:15351>
 <http://pubchem.ncbi.nlm.nih.gov/rest/rdf/compound/CID444493>
 <http://rdf.ebi.ac.uk/resource/surechembl/molecule/SCHEMBL6086>
 <http://identifiers.org/pubchem.compound/444493>
 <http://ops.rsc.org/Compounds/Get/1769651>
 <http://info.identifiers.org/cas/72-89-9>
 <http://identifiers.org/hmdb/HMDB01206>
 <http://identifiers.org/kegg.compound/C00024>
 <http://info.identifiers.org/wikipedia.en/Acetyl-CoA>
 <http://en.wikipedia.org/wiki/Acetyl-CoA>
 <http://info.identifiers.org/chemspider/392413>
 <http://ops.rsc-us.org/OPS1769651>
 <http://www.chemspider.com/392413>
 <http://ops.rsc.org/OPS1769651/rdf>
 <http://info.identifiers.org/chebi/CHEBI:15351>
 <http://rdf.chemspider.com/392413>
 <http://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:15351>
 <http://www.hmdb.ca/metabolites/HMDB01206>
 <http://www.kegg.jp/entry/C00024>
 <http://bio2rdf.org/cpd:C00024>
 <http://ops.rsc.org/OPS1769651>
 <http://bio2rdf.org/chebi:15351>
 <http://identifiers.org/chebi/CHEBI:15351>
 <http://commonchemistry.org/ChemicalDetail.aspx?ref=72-89-9>
 <http://www.ebi.ac.uk/ontology-lookup/?termId=CHEBI:15351>
egonw commented 7 years ago

Yes, the problem seems to be in the fact that the IMS instances do not properly handle directionality it seems... since both input IRIs are equivalent (the IMS says so), it should not matter which one you start with and you should get the same number of mappings.

egonw commented 7 years ago

@Christian-B, is there anything you can think of why the two IRIs do not give the same number of matches?

Christian-B commented 7 years ago

Without looking in any detail or at the particular example I think this may well be related to transative mappings and the choice of where to stop Especially as most mappings are near mappings .

The IMS will not keep going back to the same type of URL For example there are often the cases A1 -> B2
B2 -> A3 A3 -> B4 B4 -> A5 The IMS has to stop somewhere otherwise you get A1 -> A5 which usually is incorrect.

So if the IMS is hit with one of the middle URLs (A3) in the above example it may return more results than given A1

As A3 may be close enough to A1 and A5 while they are not close enough to each other

=== This gets (at least when I was in OPS) even messier when the URLs in the chain point to slightly different types of things. Again the IMS has to choose when to stop transitivity,

egonw commented 7 years ago

@Christian-B, OK, that makes a lot of sense... do you have a script that calculates all transitive link sets, so that we can reproduce that?

PS. thanks for your quick response and your response in the first place!

Christian-B commented 7 years ago

For speed all links in the IMS where loaded unidirection. This allows only one side of the maping to be searched and index.

Most predicates where considered Bidirectional so each mapping was loaded twice. But there was the abilty to handle unidirectional mappings. This was not yet used when I left three years ago,

Christian-B commented 7 years ago

Sorry Egon too long ago for me to remember.

egonw commented 7 years ago

Yeah, no worries... but I had to ask :)

danidi commented 7 years ago

There seem to be some mappings from HMDB to other sources, e.g. KEGG (http://alpha.openphacts.org:3004/QueryExpander/mappingSet/189), which are not created via the CRS. If HMDB is no allowed middle source for transitive calculation (not sure where to check that), this could explain why you find these additional mappings only when you start with HMDB directly. I'm assuming that the KEGG URIs are used in several pathways, so this could make a difference in the pathway counts.

egonw commented 7 years ago

We're working on making proper links sets for compounds in pathways... @valt is working on (or finished) parsing the WikiPathways SDF so that we can drop the HMDB link sets.

egonw commented 5 years ago

There are multiple issues now... this bug depends on a redevelopment of a streamlined data loading pipeline (well, redeveloped is likely not the right word: Paul tried to put this on the agenda, but it never was prioritized...). For now, I'll unassign myself, as I cannot do much to fix this at this moment.