monarch-initiative / pheval

A framework for empirical evaluation of phenotype matching and prioritisation
https://monarch-initiative.github.io/pheval/
Apache License 2.0
12 stars 1 forks source link

SOP for pheval releases #341

Open matentzn opened 3 months ago

matentzn commented 3 months ago

I would suggest the following SOP for PhEval releases

  1. We manually trigger a PhEval release by creating a tagged GitHub release
  2. With the triggered release, pheval is built and published on pypi
  3. At the same time, a Zenodo dump is created with the all the corpora nicely zipped (alternatively we can rely on https://github.com/monarch-initiative/pheval/archive/refs/tags/v2024-06-04.tar.gz etc which is created automatically, but then we dont have a doi)

For the pheval pipelines, we have a little config element called

pheval-version

This will

  1. install a specific version of pheval instead of the latest
  2. during corpus preparation download the archive corresponding to the version (either zenodo or github, see above)

In effect this will ensure

  1. That our corpus version is always compatible with our tool version, for example when the schema changes, or we use different gene ids, or any other changes related to the phenopackets that need to be taken into account by preprocessing
  2. That all our experiments are fully reproducible.
  3. That our pypi releases always have an associated tag in version control

@souzadevinicius @yaseminbridges @julesjacobsen

What do you think?

There is one thing I dont like about that:

All corpora go into one mega archive. It would be sort of nice if I only wanted to test one corpus, that the pipeline only downloads that one.

In theory this is easy:

The GitHub action that performs the release can zip the corpora individually and attach them to the release. In practice I dont know if the disc space limitations in GitHub actions will be exceeded. I would vote to try this..