Open danidi opened 9 years ago
Could you clarify what you mean by "different information is missing"? Depending on which URI you start with you are doing a different query of slightly different concept. Therefore the information returned is structured according to the identifier you asked about, e.g. uniprot information like <organism>
will be on the top-level element if you ask about the uniprot ID, but inside exactMatch
of the mapped uniprot identifier if you ask about the concept wiki ID.
Or are there other differences due to different identity mappings?
You will see that the first link do not include identifiers like mesh:C496348
and ncim:C1527757
, both of which are protein justifications. The mapped drugbank identifiers are also different. Starting with uniprot we find a mapping in the new drugbank target v4 mapping:
Drugbank Target v4
BE0002131 http://bio2rdf.org/drugbank:BE0002131
And thus the uniprot lookup contains:
<item href="http://bio2rdf.org/drugbank:BE0002131">
<targetForDrug>
<item href="http://bio2rdf.org/drugbank:DB02058">
<inDataset href="http://www.openphacts.org/bio2rdf/drugbank"/>
<genericName xml:lang="en">SU4984</genericName>
<drug_type xml:lang="en">experimental [drugbank_resource:Experimental]</drug_type>
</item>
<!-- .. -->
</item>
But in the mapping from conceptwiki we only find the v3 drugbank target:
http://identifiers.org/drugbank.target/3854
Other transitives via uniprot:P11362
are followed in the identity mappings of conceptwiki:6b60572a-1ea7-4c31-8408-b59537dd4b84 - so it seems that the new uniprot/drugbank linkset is not considered for transitivies, even with the All lens.
Yes, I'm aware of the different structuring with the exactMatch block. But in some cases, a whole block is missing. My first example seems to work properly now. Maybe I overlooked the Conceptwiki block each time I had a look at it before. So the only issue here is now the missing Drugbank block when you start with another URI than uniprot. Can the one to many mappings in db-uniprot-ls.ttl cause problems with the transitives? Or is it possible to add this linkset to the transitives as well?
I think this could be related to #251 which shows that the http://bio2rdf.org/drugbank:DB* pattern is missing in 1.5 IMS.
Fixes in #251 now live on ops2, but this problem remains, so it seems to be unrelated and probably got something to do with transitives, so I'll leave this open and investigate further tomorrow.
I believe this is because http://ops2.few.vu.nl:8081/QueryExpander/dataSource/drugbankTarget and http://ops2.few.vu.nl:8081/QueryExpander/dataSource/drugbankv4.target are not listed as Allowed Middle Sources in the Default lens.
Is your suggestion to add both to the allowed middle sources? I think that might not be what you want..
I don't know (I haven't heard of Allowed Middle Sources so far...). What would be the consequences? Which datasources are currently allowed middle sources? http://ops2.few.vu.nl:8081/QueryExpander/dataSource/drugbankTarget looks strange as it has both molecule and target URIs included. Also, they have the old drugbank version, not sure if they are still valid.
Allowed Middle Sources are linksets which can be used as transients.. so for instance following the equality links (made up example):
conceptwiki --> drugbank --> uniprot --> --> ensembl
would require both uniprot and drugbank as Middle Sources.
The far right column of Default on http://ops2.few.vu.nl:8081/QueryExpander/Lens shows the sources that are currently allowed.
Not sure about why drugbankTarget
includes both URI patterns for molecules and targets - I'll raise that as a new bug - perhaps not include that as a middle source to be safe. The only linkset included here includes only links to targets:
This molecule pattern is not included for the v4 targets:
http://ops2.few.vu.nl:8081/QueryExpander/dataSource/drugbankv4.target
As if we need both v3 and v4 drugbank targets I don't know. That needs to be checked against the cache and queries.
Checking further for this I can't see any outgoing links from Drugbankv4.target except back again to Uniprot (which would not be followed), so presumably adding it as a middle source would not make any changes to the output as well - so something else is wrong. I will try around with some alternative middle sources in my local install to check.
To summarize:
The link chain we want to be followed are:
so for some reason the transitive link from uniprot to drugbankv4.target is not followed, but it IS shown if looking up the uniprot directly. The lenses and justifications should permit this.
@danidi pointed out that there could be an issue within the Transitive-on-the-Fly that needs to be told separately about the new drugbankv4.target
linkset, so I'll investigate this using a debugger.
Related mapping sets:
through
Source
ConceptWiki
Target
Uniprot-TrEMBL
Predicate
exactMatch
Justification
ConceptWikiProtein
Mapping Source
www_uniprot_org_uniprot-protein.ttl
and
Source
Uniprot-TrEMBL
Target
Drugbank Target v4
Predicate
exactMatch
Justification
SIO_001171
Mapping Source
db-uniprot-ls.ttl
It is caused by incompatible justifications.
shows the justifications that are currently allowed from ConceptWikiProtein
, which includes only SIO_010043 (protein) and SIO_000985 (protein coding gene), but not SIO_001171 (database cross-reference) which is what is used in the linkset from Uniprot to Drugbank.
Hence Uniprot to Drugbank is not currently combinable with ConceptWiki to Uniprot. Do you think it should be? If I add SIO_001171
, it would also enable lots of other transitively linksets to be allowed through ConceptWiki, e.g. Ensembl (as is commented in the code to explicitly not allow..)
One workaround - hand-edited VoId file with a different justification than SIO_001171
(which we could then add to the OpsJustificationMaker
if needed). What is truly the link from uniprot to drugbank? Can it be something more specific than "cross reference"?
Wow, congratulations on figuring that one out! Could SIO_010043 (protein) be added to the Uniprot/Drugbank linkset, to add only this one for now? I think the protein justification is used for mappings between different protein identifiers (although the definition of a protein is basically something else). I'm a bit hesitant to include all database cross-reference datasets, if we don't know which other datasets this would include. Maybe something to keep in mind for the reload of the IMS in the future?
I think that is the easiest workaround to just use SIO_010043
here, which is a simple change to the data loading and requires no code changes. I shall have a go.
Workaround loading with SIO_010043
works good.
which now includes http://bio2rdf.org/drugbank:BE0002131
and thus drugbank info is included in ops2
:
So I hand it over to Yrjänä @ghard to deploy at OpenLink:
QueryExpander.war
from QueryExpander 2.0.3 (remember to rename! :)ims-1.5-2015-05-12.sql.gz
from http://data.openphacts.org/1.5/ims/Sysadmin info for @antonisloizou - updated Docker containers on ops2.few.vu.nl
are ims-20150513
linked to mysql-for-ims-20150513
, ports remain the same (AJP 8009, HTTP 8081) e.g. http://ops2.few.vu.nl:8081/QueryExpander/
The "semi-empty" chembl20 instance at http://ops2.few.vu.nl:8082/QueryExpander has not been updated as it has not got the drugbank linkset yet.
Documentation on how to reproduce: https://github.com/openphacts/queryExpander/tree/master/docker#custom-data-loading
@stain is this still relevant to IMS 2.0 or should we close?
Migrating it here from an email conversation (drugbank target mappings). Depending on the URI which is used as query, different information from the target information call is missing. Using uniprot, the ConceptWiki information is missing: https://beta.openphacts.org/1.5/target?uri=http%3A%2F%2Fpurl.uniprot.org%2Funiprot%2FP11362&app_id=f91c5b2b&app_key=18a5d823d0e4933ac5fe22a3d52974c1
Using the corresponding ConceptWiki URI, the drugbank information is missing: https://beta.openphacts.org/1.5/target?uri=http%3A%2F%2Fwww.conceptwiki.org%2Fconcept%2Findex%2F6b60572a-1ea7-4c31-8408-b59537dd4b84&app_id=f91c5b2b&app_key=18a5d823d0e4933ac5fe22a3d52974c1
Here is another example: https://beta.openphacts.org/1.5/target?uri=http%3A%2F%2Fwww.conceptwiki.org%2Fconcept%2Fb79f8003-ce3c-4056-9169-7bc93ff7ed60&app_id=f91c5b2b&app_key=18a5d823d0e4933ac5fe22a3d52974c1
(The corresponding uniprot URI retrieves the Conceptwiki URI here: https://beta.openphacts.org/1.5/target?uri=http%3A%2F%2Fpurl.uniprot.org%2Funiprot%2FQ13233&app_id=f91c5b2b&app_key=18a5d823d0e4933ac5fe22a3d52974c1)