related-sciences / nxontology-data

NXOntology data: making ontologies accessible as simple JSON files
Other
10 stars 3 forks source link

Accessing MeSH NXO #2

Closed eric-czech closed 2 years ago

eric-czech commented 2 years ago

Hey @dhimmel, I was trying to pull the 2021 MeSH NXO like this:

from nxontology import NXOntology
url = "https://github.com/related-sciences/nxontology-data/raw/71cf538dc5c258ada880d58663b0205b7b7f8561/001_medical_subject_headings_mesh_desctree.json.gz"
nxo = NXOntology.read_node_link_json(url)

I was a little surprised to find that the node ids are ints and that there isn't a lot of data attached to them:

pd.Series(type(n) for n in nxo.graph.nodes).value_counts()
<class 'int'>    920388

nxo.node_info(1).data
{'name': 'Organisms Category',
 'description': None,
 'pubchem_hnid': 1269010,
 'url': 'http://www.ncbi.nlm.nih.gov/mesh/1000066'}

Is there another way to get the unique ids, class, and tree numbers (for descriptors)?

dhimmel commented 2 years ago

You'll want to use the MeSH in the output/mesh branch rather than the output/pubchem branch. We could consider no longer exporting MeSH from pubchem now that we have a dedicated export path for MeSH.

See the data here. Regarding the weird directory path undefined/actions_github_pages_1651177463639, looking into that in https://github.com/peaceiris/actions-gh-pages/issues/740, but data is still usable, just make sure to commit hash version.

dhimmel commented 2 years ago

You'll probably want the mesh_topical_descriptor_descendants.json.gz network. It only includes nodes that descend from a Topical Descriptor and so would exclude parts of MeSH such as Geographical Descriptors and disconnected Supplemental Concept Records.

eric-czech commented 2 years ago

You'll want to use the MeSH in the output/mesh branch rather than the output/pubchem branch.

🤦

See the data here.

Awesome, thanks! That works.

dhimmel commented 2 years ago

BTW if you'd like MeSH 2022, it will now be created upon the next CI mesh export following d0b3c82738c600448b026a3cd20152a5de8c4780 (which might now work due to the deploy bug)

eric-czech commented 2 years ago

That would be really great actually. Can I kick off these workflows myself sooner in https://github.com/related-sciences/nxontology-data/actions/workflows/create.yaml? Should I check "Overwrite output on an existing branch"?

dhimmel commented 2 years ago

Yes kick off yourself and select overwrite, but I'm worried there will be a deployment bug. If so, you could try deleting the output/mesh branch and then rerunning. Should work that way.

eric-czech commented 2 years ago

https://github.com/related-sciences/nxontology-data/actions/runs/2296536149 🎉

It did fail the first time but not the second after deleting the branch, as you suggested.

dhimmel commented 1 year ago

Regarding the weird directory path undefined/actions_github_pages_1651177463639

https://github.com/related-sciences/nxontology-data/pull/6 fixed the weird path issue, so now the output/mesh branch contents are proper.