Open danidi opened 7 years ago
"You spying Basterds"? :)
@danidi yeah, I'm thinking in that direction too... I will explore this before the next MSCPiLS meeting this Thursday...
Oh, BTW, I check the map/ function in the API, and there both are given as "equivalent"... but, yes, I think too it must have to do with lenses not correctly used or so...
I guess one of your students ;) Thank you for looking into it!
7 months ago the counts were 163 and 279. Now they are 326 and 489.
For the first query the set of URIs inserted into the SPARQL query is:
<http://www.hmdb.ca/metabolites/HMDB01206>
<https://www.surechembl.org/chemical/SCHEMBL6086>
<http://info.identifiers.org/hmdb/HMDB01206>
<http://www.chemspider.com/Chemical-Structure.392413>
<http://bio2rdf.org/chebi:15351>
<http://rdf.ebi.ac.uk/resource/surechembl/molecule/SCHEMBL6086>
<http://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:15351>
<http://www.conceptwiki.org/web-ws/concept/get?uuid=25a6ca47-0769-408d-ad02-75b8c06afd61>
<http://ops.rsc-us.org/OPS1769651>
<http://ops.rsc.org/Compounds/Get/1769651>
<http://www.chemspider.com/392413>
<http://www.conceptwiki.org/concept/25a6ca47-0769-408d-ad02-75b8c06afd61>
<http://www.chemspider.com/Chemical-Structure.392413.html>
<http://ops.rsc.org/OPS1769651>
<http://www.ebi.ac.uk/ontology-lookup/?termId=CHEBI:15351>
<http://ops.rsc.org/OPS1769651/rdf>
<http://info.identifiers.org/chebi/CHEBI:15351>
<http://purl.obolibrary.org/obo/CHEBI_15351>
<http://www.chemspider.com/Chemical-Structure.392413.rdf>
<http://info.identifiers.org/chemspider/392413>
<http://identifiers.org/obo.chebi/CHEBI:15351>
<http://purl.org/obo/owl/CHEBI#CHEBI_15351>
<http://identifiers.org/hmdb/HMDB01206>
<http://identifiers.org/chemspider/392413>
<http://purl.bioontology.org/ontology/CHEBI/CHEBI:15351>
<http://rdf.chemspider.com/392413>
<http://www.conceptwiki.org/concept/index/25a6ca47-0769-408d-ad02-75b8c06afd61>
<http://identifiers.org/chebi/CHEBI:15351>
For the second query the list of URIs is:
<http://identifiers.org/wikipedia.en/Acetyl-CoA>
<http://purl.bioontology.org/ontology/CHEBI/CHEBI:15351>
<http://www.chemspider.com/Chemical-Structure.392413.rdf>
<http://purl.obolibrary.org/obo/CHEBI_15351>
<http://purl.org/obo/owl/CHEBI#CHEBI_15351>
<http://dbpedia.org/page/Acetyl-CoA>
<http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=444493>
<http://dbpedia.org/resource/Acetyl-CoA>
<http://www.chemspider.com/Chemical-Structure.392413.html>
<http://rdf.ncbi.nlm.nih.gov/pubchem/compound/444493>
<http://info.identifiers.org/kegg.compound/C00024>
<http://identifiers.org/chemspider/392413>
<http://info.identifiers.org/pubchem.compound/444493>
<http://www.chemspider.com/Chemical-Structure.392413>
<https://www.surechembl.org/chemical/SCHEMBL6086>
<http://www.genome.jp/dbget-bin/www_bget?cpd:C00024>
<http://identifiers.org/cas/72-89-9>
<http://info.identifiers.org/hmdb/HMDB01206>
<http://identifiers.org/obo.chebi/CHEBI:15351>
<http://pubchem.ncbi.nlm.nih.gov/rest/rdf/compound/CID444493>
<http://rdf.ebi.ac.uk/resource/surechembl/molecule/SCHEMBL6086>
<http://identifiers.org/pubchem.compound/444493>
<http://ops.rsc.org/Compounds/Get/1769651>
<http://info.identifiers.org/cas/72-89-9>
<http://identifiers.org/hmdb/HMDB01206>
<http://identifiers.org/kegg.compound/C00024>
<http://info.identifiers.org/wikipedia.en/Acetyl-CoA>
<http://en.wikipedia.org/wiki/Acetyl-CoA>
<http://info.identifiers.org/chemspider/392413>
<http://ops.rsc-us.org/OPS1769651>
<http://www.chemspider.com/392413>
<http://ops.rsc.org/OPS1769651/rdf>
<http://info.identifiers.org/chebi/CHEBI:15351>
<http://rdf.chemspider.com/392413>
<http://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:15351>
<http://www.hmdb.ca/metabolites/HMDB01206>
<http://www.kegg.jp/entry/C00024>
<http://bio2rdf.org/cpd:C00024>
<http://ops.rsc.org/OPS1769651>
<http://bio2rdf.org/chebi:15351>
<http://identifiers.org/chebi/CHEBI:15351>
<http://commonchemistry.org/ChemicalDetail.aspx?ref=72-89-9>
<http://www.ebi.ac.uk/ontology-lookup/?termId=CHEBI:15351>
Yes, the problem seems to be in the fact that the IMS instances do not properly handle directionality it seems... since both input IRIs are equivalent (the IMS says so), it should not matter which one you start with and you should get the same number of mappings.
@Christian-B, is there anything you can think of why the two IRIs do not give the same number of matches?
Without looking in any detail or at the particular example I think this may well be related to transative mappings and the choice of where to stop Especially as most mappings are near mappings .
The IMS will not keep going back to the same type of URL
For example there are often the cases
A1 -> B2
B2 -> A3
A3 -> B4
B4 -> A5
The IMS has to stop somewhere otherwise you get A1 -> A5 which usually is incorrect.
So if the IMS is hit with one of the middle URLs (A3) in the above example it may return more results than given A1
As A3 may be close enough to A1 and A5 while they are not close enough to each other
=== This gets (at least when I was in OPS) even messier when the URLs in the chain point to slightly different types of things. Again the IMS has to choose when to stop transitivity,
@Christian-B, OK, that makes a lot of sense... do you have a script that calculates all transitive link sets, so that we can reproduce that?
PS. thanks for your quick response and your response in the first place!
For speed all links in the IMS where loaded unidirection. This allows only one side of the maping to be searched and index.
Most predicates where considered Bidirectional so each mapping was loaded twice. But there was the abilty to handle unidirectional mappings. This was not yet used when I left three years ago,
Sorry Egon too long ago for me to remember.
Yeah, no worries... but I had to ask :)
There seem to be some mappings from HMDB to other sources, e.g. KEGG (http://alpha.openphacts.org:3004/QueryExpander/mappingSet/189), which are not created via the CRS. If HMDB is no allowed middle source for transitive calculation (not sure where to check that), this could explain why you find these additional mappings only when you start with HMDB directly. I'm assuming that the KEGG URIs are used in several pathways, so this could make a difference in the pathway counts.
We're working on making proper links sets for compounds in pathways... @valt is working on (or finished) parsing the WikiPathways SDF so that we can drop the HMDB link sets.
There are multiple issues now... this bug depends on a redevelopment of a streamlined data loading pipeline (well, redeveloped is likely not the right word: Paul tried to put this on the agenda, but it never was prioritized...). For now, I'll unassign myself, as I cannot do much to fix this at this moment.
Does the /pathways/interactions/byEntity/count API call use the IMS? The two examples mentioned here are connected in the IMS, but retrieve different counts.