virusseq / pedigree

2 stars 0 forks source link

fetch the data from ViralAI cloud #2

Closed ghost closed 1 year ago

ghost commented 2 years ago

Cloud access details:

the metadata will be ingested to a publicly accessible GCS bucket at this path: gs://dnastack-covid-19-data/CanCOGeN/metadata
There is currently a single metadata file there. The naming format will be ${release_date}.${release_hash}.metadata.tsv, for example the currently available metadata file is named 2022-09-09T22:08:55Z.b9491d15-6d47-4137-b53b-d26f326ffb34.metadata.tsv . The release date and hash align with the releases on the virusseq page.
We will keep the most recent 5 metadata TSVs. Older ones will be deleted.
Note that we will not necessarily pull in every single release, e.g. if there is more than one release made between the ingestion time points, we’ll only get the latest one that exists when we process