monarch-initiative / biolink-api

API for linked biological knowledge
https://api.monarchinitiative.org/api/
BSD 3-Clause "New" or "Revised" License
63 stars 25 forks source link

Discrepancy in results due to a mixed variety of calls #318

Open deepakunni3 opened 4 years ago

deepakunni3 commented 4 years ago

EPM2A (HGNC:3413):

Questions

1. Why is Monarch beta showing 2 function associations but the table shows 0? Answer: Monarch beta is using BioLink API to query for function associations. There is a step of converting the HGNC:3413 gene to a UniProtKB identifier, using MyGene.info service, before querying GOlr. The corresponding call to MyGeneInfo is http://mygene.info/v3/query?q=HGNC:3413&fields=all This call yields UniProtKB:O95278 as the entry instead of UniProtKB:B3EWF7. Because of this change, we end up with 0 results from GOlr since there are no GO annotations for this identifier.

Recommended solution: At least for Monarch, we need to use SciGraph for Gene to UniProtKB identifier conversion. MyGeneInfo can sometimes yield different protein identifier mapping, as observed here.

2. Why does Monarch legacy yield only 2 function associations when AmiGO states 4 associations? Answer: This is because Monarch legacy calls Monarch Solr for function associations instead of GOlr.

3. Why the difference between 4 GO annotations and 13 GO annotations between AmiGO and Genecards? Answer: ...Up for debate...

Thanks to @colleenXu for reporting this discrepancy.

@kshefchek @cmungall @monicacecilia @kltm Would be great to have your inputs on this.

P.S: This issue is related to https://github.com/monarch-initiative/monarch-ui/issues/167