malariagen / malariagen-data-python

Analyse MalariaGEN data from Python
https://malariagen.github.io/malariagen-data-python/latest/
MIT License
13 stars 24 forks source link

Tighten up release discovery #459

Closed leehart closed 1 month ago

leehart commented 11 months ago

The API currently detects "valid" releases via its _discover_releases function, which looks for a GCS subpath that starts with "v" followed by the major version number (e.g. v3 for Ag3, or v1 for Af1) and the existence of a manifest file, i.e.

if d.startswith(f"v{self._major_version_number}")
    and self._fs.exists(f"{self._base_path}/{d}/manifest.tsv")

I expect we want to tighten up that condition so that a release is only valid if either nothing or a period follow the major version number, e.g. "v30B1" should not be valid as a release string for Ag3, whereas "v3" and "v3.1" are valid.

Relatedly, for Ag3, there is currently a v3_cohorts subpath in the release bucket, which doesn't get picked up as a valid release (correctly) but only because it doesn't contain a manifest.tsv file.

leehart commented 3 months ago

Re: This should get remedied via PR https://github.com/malariagen/malariagen-data-python/pull/560