monarch-initiative / phenopacket-store

Collection of phenopackets
https://monarch-initiative.github.io/phenopacket-store/
BSD 3-Clause "New" or "Revised" License
12 stars 4 forks source link

Use remote HPO when running the notebooks #82

Closed ielis closed 2 months ago

ielis commented 4 months ago

Use remote HPO when running the FBN1 notebooks and suggest downloading HPO from a remote URL in the documentation.

We need this functionality to run CI in pyphetools.

The PR, however, needs pyphetools to be released first, because the current pyphetools version does not allow using a URL, and the FBN1 notebooks would explode when running HpoParser, since HpoParser currently needs hp.json as an existing local file, and URL is not a local file.

Therefore, if using remote HPO sounds OK, we first must do the following, before accepting this PR:

@pnrobinson pls give me thumbs up if sounds good, or let me know if anything looks fishy. Thanks!

pnrobinson commented 4 months ago

The current version of pyphetools does allow not indicating the URL and this will pull back the latest version of hp.json. The problem is that apparently on some systems there is an SSL error. Probably we need to document that the user will need to do this including https://cheapsslweb.com/blog/ssl-certificate-verify-failed-error-in-python. This appears to be working on your machine, but I am guessing this is something that can go wrong on machines "out there" so we need to anticipate this problem?

ielis commented 4 months ago

@pnrobinson After adding OntologyStore to hpo-toolkit==0.5.0, we can easily load the latest HPO (or a specific version) without unnecessary network traffic.

I updated few FBN1 notebooks to showcase the usage (e.g. cells 2..4 here).

I'd like to merge this PR to finalize setting up CI in pyphetools. The merge will update all the FBN1 notebooks AND the phenopackets. The changes in phenopackets include:

The changes are not a result of my work but arise from the other pyphetools updates.

Pls give me a :+1: if it looks good!

pnrobinson commented 3 months ago

@ielis Sorry, when I first saw this it was work in progress. pyphetools can download remote hp.json, but AFAICS the problem is that with certain settings, hpotk fails. Here are possible solutions: https://support.chainstack.com/hc/en-us/articles/9117198436249-Common-SSL-Issues-on-Python-and-How-to-Fix-it Some could be added to the notebooks, e.g.,

import ssl
ssl._create_default_https_context = ssl._create_unverified_context

possibly others could be added to hpotk, e.g.

import ssl 
import certifi from urllib.request 
import urlopen 

request = "https://.../hp.json" 
urlopen(request, context=ssl.create_default_context(cafile=certifi.where()))

(not tested) but I would not think we need to change the code in phenopacket-store?

ielis commented 3 months ago

The SSL issue should be resolved in hpo-toolkit>0.5.0, however, I haven't had a chance to test it because never saw the SSL issue.

pnrobinson commented 3 months ago

@ielis not sure where we are with this PR. There are some conflicts because I cleaned up the repo. Can we tidy up and merge? Also, it would be nice to have a default place to download the hp.json to

ielis commented 2 months ago

I re-ran the notebooks and updated the phenopackets, and the PR should be ready for merge now.

Also, it would be nice to have a default place to download the hp.json to.

Yes, we have OntologyStore API since hpotk==0.5.0.

This is how to use it:

import hpotk

store = hpotk.configure_ontology_store()
hpo = store.load_minimal_hpo(release='v2024-01-16')

Note, the release is optional. The store will get you the latest HPO from GitHub if you leave it out. Please use it everywhere from now on.

Pls give me a :+1: if this looks OK and we can merge.

ielis commented 2 months ago

So this is running all of the notebooks with each release?

No, not in phenopacket-store but in pyphetools - the screenshot in the 1st PR post shows how pyphetools can use phenopacket-store in CI. This is ensure that the documentation is accurate - the notebooks are up to date. In other words, if we make a breaking change in pyphetools we should propagate the change across the notebooks to prevent users' frustration.

How will we know if there was an error?

The CI will fail if the notebook does not finish with exit status of 0.


The CI only runs FBN1 notebooks because running everything would take too long. However, we should consider adding a GitHub action that will run all the notebooks and package the phenopackets upon each phenopacket-store release. There are ways for doing that..