SOP for pheval releases

I would suggest the following SOP for PhEval releases

We manually trigger a PhEval release by creating a tagged GitHub release
With the triggered release, pheval is built and published on pypi
At the same time, a Zenodo dump is created with the all the corpora nicely zipped (alternatively we can rely on https://github.com/monarch-initiative/pheval/archive/refs/tags/v2024-06-04.tar.gz etc which is created automatically, but then we dont have a doi)

For the pheval pipelines, we have a little config element called

pheval-version

This will

install a specific version of pheval instead of the latest
during corpus preparation download the archive corresponding to the version (either zenodo or github, see above)

In effect this will ensure

That our corpus version is always compatible with our tool version, for example when the schema changes, or we use different gene ids, or any other changes related to the phenopackets that need to be taken into account by preprocessing
That all our experiments are fully reproducible.
That our pypi releases always have an associated tag in version control

@souzadevinicius @yaseminbridges @julesjacobsen

What do you think?

There is one thing I dont like about that:

All corpora go into one mega archive. It would be sort of nice if I only wanted to test one corpus, that the pipeline only downloads that one.

In theory this is easy:

The GitHub action that performs the release can zip the corpora individually and attach them to the release. In practice I dont know if the disc space limitations in GitHub actions will be exceeded. I would vote to try this..

monarch-initiative / pheval

SOP for pheval releases #341