nextstrain / auspice

Web app for visualizing pathogen evolution
https://docs.nextstrain.org/projects/auspice/
GNU Affero General Public License v3.0
292 stars 162 forks source link

Auspice testing data (get-data) #1558

Open jameshadfield opened 2 years ago

jameshadfield commented 2 years ago

Background

This repo doesn't contain any datasets beyond some very minimal examples to explain the dataset format. Instead we rely on the get-data script which downloads a slew of (nextstrain core) datasets. In times gone by, this was an accurate listing of all of our core datasets (and other sources didn't exist -- groups, community etc). This script is often run manually (e.g. npm run get-data) so you can have some data to play with, and heroku runs this during setup (npm run heroku-postbuild) which results in usable datasets in review apps.

Shortcomings

A lot of auspice's functionality cannot be tested with the data here and two PRs in the last couple of weeks have highlighted this: https://github.com/nextstrain/auspice/pull/1557 and https://github.com/nextstrain/auspice/pull/1552. This means the heroku-review apps are not useful, and people have to manually checkout the auspice branch and obtain an appropriate dataset for testing.

Proposal

We should make the get-data script obtain a useful set of testing datasets, preferably using timestamped datasets so we can ensure reproducibility. For PRs which need additional datasets to test, these should be added to the get-data script as part of the PR.

victorlin commented 2 years ago

Note that the PR review apps allow us to test on live nextstrain.org data

jameshadfield commented 2 years ago

Note that the PR review apps allow us to test on live nextstrain.org data

That's true (and a big positive) but it doesn't help with local development and, some time in the future, actual tests within auspice.

I'll take a crack at this issue today - it seems simple on the surface!