wikipathways / BioThings_Explorer_PFOCR_prioritization

PFOCR for prioritization/clustering of BioThings Explorer (BTE) TRAPI results
MIT License
0 stars 0 forks source link

More recent TRAPI results cause problems in PET notebook #6

Closed khanspers closed 1 year ago

khanspers commented 1 year ago

Some TRAPI results cause an error in the PET notebook, seemingly because "name" is not found for some nodes in the returned knowledge graph.

The error is:

Error in Python KeyError Traceback (most recent call last)

in () 5 curie_to_unified_curies = dict() 6 for k, v in trapi_message["knowledge_graph"]["nodes"].items(): ----> 7 curie_to_name[k] = v["name"] 8 #first check is k is already what we want 9 [prefix, identifier] = k.split(":") KeyError: 'name' Query results examples (both are "what treats Huntington's disease?"): Results URL that works: https://arax.ncats.io/api/arax/v1.3/response/a46d7144-718e-40dc-bde7-8733f38d9d04 (run around mid-Febrary) Results URL that doesnt work: https://arax.ncats.io/api/arax/v1.3/response/33dd53d9-fd8e-4f17-9a3b-055a782d8467 (run on April 5) For now, the PET notebook has been updated to not expect a "name", to make it possible to run the notebook. However, this affects how results from the notebook are displayed, it now presents the CURIE instead of the name, which is less human readable.
colleenXu commented 1 year ago

I think this is related to this issue https://github.com/biothings/BioThings_Explorer_TRAPI/issues/539#issuecomment-1487638950. We think it's related to a change in the ARS, but I'm not sure if we've reached out to their team (NCATS/Link Brokers) yet.

If possible, could we use direct responses from BTE (rather than ones that go through ARS/ARAX)? Or a local instance of BTE?

andrewsu commented 1 year ago

Yes, I agree with @colleenXu that this issue is related to ARS node normalization. I expect that this issue will be fixed sometime in the (hopefully near) future. But, to future-proof the PET notebook, I suggest adding a tiny bit of logic to say "use node name unless it's null, in which case fall back to using the CURIE"...

ayushi-agrawal-gladstone commented 1 year ago

@khanspers I have updated the notebook as Andrew suggested. In the latest version, if the node name is missing in the knowledge graph, then the unified CURIE is used as the node name. If there is no unified CURIE for the node, then "NA" is used for the node name. Please test it out and let me know if it works for you.

khanspers commented 1 year ago

If possible, could we use direct responses from BTE (rather than ones that go through ARS/ARAX)? Or a local instance of BTE?

@colleenXu : For querying BTE directly, is this the correct URL: https://api.bte.ncats.io/v1/query?

andrewsu commented 1 year ago

@khanspers that is the correct URL for the "synchronous query" endpoint on our dev server. https://api.bte.ncats.io/v1/asyncquery would be the "async endpoint" on the dev server. The prod server would be the same with the base URL of https://bte.transltr.io/.

BUT I don't think we want the PET notebook to be executing TRAPI queries. I think Colleen's question is whether the PET notebook will work on BTE output that hasn't been processed by the ARS and then served by the ARAX UI (like the URLs above). More specifically, when a user executes a query using one of the async endpoints, you end up with a response URL like this: https://bte.transltr.io/v1/check_query_status/U6fjFr5WE3. It's a TRAPI response with a little wrapper around it.

If easy, it would be great to allow for users to use a URL like that. If not, we should assume that the ARS bug will be fixed soon, and that the changes that @ayushi-agrawal-gladstone described in this comment is a reasonable fallback.

khanspers commented 1 year ago

Right, I didn't mean for the notebook to execute queries, sorry that wasn't clear. Indeed I was looking for some alternative response URL (or BTE responses in some other format) to be used in the notebook as an alternative to how it works now.

So if I post queries to https://api.bte.ncats.io/v1/asyncquery or https://bte.transltr.io/v1/asyncquery in Postman, I should be able to use the response URL (https://bte.transltr.io/v1/check_query_status/U6fjFr5WE3), swapping out the response ID?

khanspers commented 1 year ago

Im getting this error again, but I can get it to run by commenting out line "curie_to_name[k] = v["name"]" fixes it. Based on the history I think this is the line that was removed earlier to fix this? @ayushi-agrawal-gladstone

ayushi-agrawal-gladstone commented 1 year ago

@khanspers Yes, you are right that this was the line that I had fixed. Looks like I overwrote the changes by mistake while adding new features to the notebook. This happened as my local copy of the repo was not up to date. I will be more careful about my local versions of the code. I have fixed the PET notebook to work as specified in my previous comment. Please let me know if you still get errors.

ayushi-agrawal-gladstone commented 1 year ago

No longer seeing this issue with newer validator 3.5.3 and up (BTE issue).