neurobagel / planning

MIT License
0 stars 0 forks source link

Update the MNI-PD graph-ready data and make it federation-accessible #39

Closed alyssadai closed 1 year ago

alyssadai commented 1 year ago

(Aka, second iteration of https://github.com/neurobagel/project/issues/145)

To populate a new MNI node(s), we need to generate new versions of the JSON-LD files for the PPMI and QPN that reflect the latest changes to 1) the phenotypic TSVs 2) the Neurobagel data model.

Steps to implement

alyssadai commented 1 year ago

For reviewer: you can check that the above steps are complete by (in no particular order, the last one being probably the most exciting)

surchs commented 1 year ago

@alyssadai thanks a lot for all this work!

I have some questions that should be quick to clarify before we can close

JSONLD files under /data/origami/neurobagel_graph_data check for healthy controls: ncit

I can find ncit prefixes for PPMI, but not for QPN. That seems unintuitive to me. Could you confirm that grep ncit /data/origami/neurobagel_graph_data/qpn/qpn.jsonld should only hit the @context in QPN because

in the .env file under ~/projects/federation-api on the st-viateur machine

OK, this made me scratch my head for a bit. Let me try to describe what I think is happening here and you tell me if that's right:

I think this shows that keeping names of servers, docker containers and numbers of ports clean and organized can quickly get tricky. Let's have a chat about how we can clean up the servers a bit and maybe make deployments more automatic.

Running a sample query using the interactive API docs for each PD graph at http://206.12.89.194:8888/docs (PPMI) and http://206.12.89.194:8080/docs (QPN)

This works well :tada:

query for "Parkinson's disease" and see both PPMI and QPN 🎉

:tada:

So take another look at these healthy_control labels and see if the rest makes sense. Then I think we can close!

alyssadai commented 1 year ago

I can find ncit prefixes for PPMI, but not for QPN. That seems unintuitive to me. Could you confirm that grep ncit /data/origami/neurobagel_graph_data/qpn/qpn.jsonld should only hit the @context in QPN because no subject in QPN has been labeled with healthy control

This is expected, and is due to the issue of the CLI currently only storing info about the first instance of a column about a given Neurobagel variable. The QPN TSV contains two columns about diagnosis, one of which has values PD and PSP (progressive supranuclear palsy), and the other column classes individuals as PD or healthy control. Both were annotated in the data dictionary, but only the PD/PSP diagnosis column is in the graph right now due to order of appearance in the raw data.

  • the federation API is running on fairmount (206.12.99.17) on port 8888, not on viateur (206.12.89.194)
  • the .env file you describe thus lives on fairmount too and points to nodes on viateur
  • the QPN and PPMI APIs are running on viateur on ports 8888 and 8080 respectively

Ah yes, my point about the federation API running on st-viateur was a typo, I meant to say that the new PD data graphs live on that machine. 🙂 Below is the accurate info (most of what you said was correct):

surchs commented 1 year ago

Ah, cool. Thanks for clarifying. I copied your explanation into https://github.com/neurobagel/bagel-cli/issues/224 for additional reference.

I think with this we can close this :cook: