Trouble getting API to work for retrieving propagated gene annotations

ChristopherMancuso commented 11 months ago

Hi,

I am trying to access the monarch-py api and I'm running into some problems. For context, the end goal is that I would like to query monarch with a phenotype term and get back a list of all genes annotated to that term, with the annotations propagated up the ontology, with annotations from more specific terms being added to more general ancestor terms. Currently, I'm using annotations from https://archive.monarchinitiative.org/latest/tsv/gene_associations/ but these annotations don't appear to be propagated.

To get propagated gene annotations we were told to use the API. However, I can't get the monarch-py package to work, which is where I was taken when I clicked on the API link in the Monarch about page. I installed it on the CU HPC ALPINE via pip install monarch-py.

The first issue is that the api documentation for the Basic Example - As a Module contained a spelling error and I have posted about that in a separate issue on the monarch-app repo.

After fixing the above to be able to import correctly, the basic function still won't work. The code I used is

from monarch_py.implementations.solr.solr_implementation import SolrImplementation

si = SolrImplementation()

entity = si.get_entity("MONDO:0012933", extra=False)
print(entity.name)

with the following error

requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8983): Max retries exceeded with url: /solr/association/select?q=%2A%3A%2A&rows=20&start=0&facet=true&fq=predicate%3Abiolink%5C%3Ahas_phenotype&defType=edismax&q.op=AND&mm=100%25&facet_min_count=1 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f71e9174b80>: Failed to establish a new connection: [Errno 111] Connection refused'))

I noticed on the Monarch help page that the API was down but API-V3 is working. Is monarch-py using the original API and not V3?

I also tried to use the python requests library directly using

import requests

headers = {'content-type': 'application/x-www-form-urlencoded'} # also tried with 'application/json'
params = 'subject_taxon=NCBITaxon:40674&object=HP:0000492'
url = 'http://api.monarchinitiative.org/api/association/find/gene/phenotype'
res = requests.post(url, data=params, headers=headers)
content = res.json()
print(content)

I have recently made posts via other apis and via python's requests package, so I don't think it just in general that not working for me.

Any help in getting the propagated annotations for phenotype terms would be much approached!

glass-ships commented 11 months ago

Hi @ChristopherMancuso thanks for dropping this off!

I'll start off by apologizing about the state of our documentation, it should really mention that monarch-py is currently intended to be used with a local instance of our Solr Docker container running.

For immediate usage, if you have monarch-py and docker installed, you can use the following CLI commands to get that started:

monarch solr download
follow the printed instructions to adjust the relevant file permissions
monarch solr start
Confirm it's running with: monarch solr status

Once you have a local Solr instance running, the code you posted above using SolrImplementation() should work as expected.

Long term, we'd like to have a way for monarch-py to call the official Monarch API so that users don't need to go through these steps.

To approach this, we should:

[ ] Make an issue to address documentation to a) correct the spelling error you mentioned, and b) make the currently intended usage more clear
[ ] Make another issue to add in the implementation allowing it to talk to the official API

@kevinschaper may also have some input here

kevinschaper commented 11 months ago

Hi @ChristopherMancuso,

Confusingly, api.monarchinitiative.org is still our older API, and as @glass-ships mentioned, monarch-py is used within our api, and we haven't yet updated it so that it will pull data from the public API.

From the new API, you should be able to use a get request to pick up phenotype associations for a given phenotype with a query like

http://api-v3.monarchinitiative.org/v3/api/association?predicate=biolink%3Ahas_phenotype&object=HP%3A0000492&limit=20&offset=0

We don't have taxon filtering on the v3 api endpoint yet, though, currently for this request (an HPO term) the only responses would be for human data.

Additionally, to wrap it up in some python:

response = requests.get("http://api-v3.monarchinitiative.org/v3/api/association?predicate=biolink%3Ahas_phenotype&object=HP%3A0000492&limit=20&offset=0").json()
print(response)

kevinschaper commented 11 months ago

Oh, also, here's the new API docs:

https://api-v3.monarchinitiative.org/v3/docs

It would be super helpful for us if you can file specific issues for anything that the API doesn't yet support that you need, and we'll try to get them in as soon as we can. The old API was pretty comprehensive, and we took some guesses about what would be most important to support initially, but feedback on what we missed will be super helpful.

ChristopherMancuso commented 11 months ago

Hi @kevinschaper and @glass-ships, thanks for the quick reply! I think what you are both saying for next steps to try/do makes sense to me. I'll give the above suggestions a try in the next day or so and leave some feedback on this thread or create a new issue as needed. I really appreciate the help with this!

ChristopherMancuso commented 11 months ago

I tried to tried to break up some issues as best I could and post them as tickets. I see @kevinschaper you answered one already. One question that maybe is just for this ticket still is, what is the relationship between biolink and monarch? They seemed heavily intertwined but I can't figure out how they fit together. Is that important to know when testing out the API V3 features?

For moving forward I will stick with api V3 as opposed to monarch-py. I work on mostly HPC systems and getting docker containers installed is usually completely prohibited on some clusters or at a minimum pretty complicated to do on others.

Thanks again for the quick follow up on these questions, I really appreciate it!

glass-ships commented 11 months ago

So, Monarch uses things like the Biolink Model and KGX. which are developed by biolink, during the creation and curation of our knowledge graph (our data model is relatively biolink compliant).

To that end, a familiarity with the biolink model may be helpful in interpreting results, but in terms of just using the Monarch API, you should be fine with minimal understanding.

The public API is powered by monarch-py in the backend, so it may also be handy to reference the documentation for that and its responses: https://monarch-app.monarchinitiative.org/Data-Model/

kevinschaper commented 11 months ago

An especially confusing aspect is that we're retiring the "Biolink API" as we're moving our graph into a data model called the Biolink Model. Monarch is involved in building and refining the Biolink Model (and associated tooling), though, many of the use cases have been driven by the NCATS Translator project, so the model extends well beyond what we bring together in our graph.

ChristopherMancuso commented 10 months ago

I looked through Biolink and KGX. It is pretty cool how so much information is being shared across different projects!

monarch-initiative / helpdesk

Trouble getting API to work for retrieving propagated gene annotations #109