Closed dhimmel closed 3 years ago
@dhimmel thanks for making an issue. I would have done so myself but I was on the run when I tweeted at you. Here's a little more context:
The code you'd need after doing pip install bioversions
is:
import bioversions
ensembl_version = bioversions.get_version("ensembl")
This code executes a live request to the Ensembl website and does some HTML parsing/traversal to pick out the version number. This actually runs on a nightly build (along with all of the other version getter functions in Bioversions) that writes to a YAML file on the Bioversions GitHub repository, so you can use this alternative code that doesn't actually rely on Bioversions as a Python dependency:
import requests
import yaml
url = "https://raw.githubusercontent.com/biopragmatics/bioversions/main/docs/_data/versions.yml"
res = requests.get(url)
res_yaml = yaml.safe_load(res.text)
versions = {
entry["prefix"]: entry["releases"][-1]["version"]
for entry in res_yaml["database"]
if "prefix" in entry
}
ensembl_version = versions["ensembl"]
Note: I forgot that the single source of truth for the daily updated data is natively stored in JSON at https://raw.githubusercontent.com/biopragmatics/bioversions/main/src/bioversions/resources/versions.json. A better way, that doesn't rely on a YAML parser would be:
import requests
url = "https://raw.githubusercontent.com/biopragmatics/bioversions/main/src/bioversions/resources/versions.json"
res_json = requests.get(url).json()
versions = {
entry["prefix"]: entry["releases"][-1]["version"]
for entry in res_json["database"]
if "prefix" in entry
}
ensembl_version = versions["ensembl"]
https://github.com/related-sciences/ensembl-genes/pull/3 added the JSON request approach to get the latest version. Still haven't created the scheduled CI builds. Slightly dependence on #2
Okay I added scheduled export builds in https://github.com/related-sciences/ensembl-genes/commit/b75c8939252c353c0ada5eeec087a955aafb2991 along with an overwrite option for whether to re-export if an output branch exists.
Both scheduled and dispatch jobs now default to overwrite=false. Must set overwrite=true on an dispatch to overwrite.
Here are two export CI logs
@cthoyt tweeted:
This is a great idea and would reduce future maintenance. Happy to use bioversions for this.
We will need to detect if an output already exists. Should be able to do this by looking at the git branches.
Sometimes exports will fail, for example if a release changes the schema. These changes take a non-trivial amount of effort to fix. For this reason I lean towards weekly scheduled jobs, so when this is failing it becomes a weekly and not daily annoyance.