nextstrain / nextclade

Viral genome alignment, mutation calling, clade assignment, quality checks and phylogenetic placement
https://clades.nextstrain.org
MIT License
219 stars 58 forks source link

Can't download dataset in the docker of nextstrain/nextclade:2.0.0-alpha.3-debian #836

Closed xzhub closed 2 years ago

xzhub commented 2 years ago

I downloaded the docker image nextstrain/nextclade:2.0.0-alpha.3-debian, but I can't download the latest dataset:

docker run -it nextstrain/nextclade:2.0.0-alpha.3-debian bash
root@4474d94696be:/# RUST_BACKTRACE=full COLORBT_SHOW_HIDDEN=1 RUST_BACKTRACE=1 nextclade dataset get --name='sars-cov-2' --output-dir='/tmp/sars-cov-2'
Error:
   0: When parsing dataset index
   1: When parsing JSON
   2: expected value at line 1 column 1

Location:
   /workdir/packages_rs/nextclade/src/io/json.rs:9

━━━━━━━━━ BACKTRACE ━━━━━━━━━━
   1: __libc_start_main<unknown>
      at <unknown source file>:<unknown line>

Run with COLORBT_SHOW_HIDDEN=1 environment variable to disable frame filtering.
Run with RUST_BACKTRACE=full to include source snippets.
xzhub commented 2 years ago

nextclade dataset list report similar error message

ivan-aksamentov commented 2 years ago

@xzhub This is expected.

Nextclade v2 introduces some breaking changes to dataset index file. However, the alphas are still configured to download data from the production v1 dataset server (https://data.clades.nextstrain.org), which corresponds to release branch of the https://github.com/nextstrain/nextclade_data/ repo, but it does not have the new dataset files there yet. We will push the new dataset files simultaneously with the Nextclade v2 release.

Meanwhile, I released alpha.4 and alpha.5 with some fixes, and you can add a flag

--server="https://data.master.clades.nextstrain.org" 

to switch to the master dataset server (https://data.master.clades.nextstrain.org), which corresponds to the master branch of https://github.com/nextstrain/nextclade_data/.

I tested with this command:

docker run -it nextstrain/nextclade:2.0.0-alpha.5-debian bash -c 'nextclade dataset get --name="sars-cov-2" --server="https://data.master.clades.nextstrain.org" --output-dir="/tmp/sars-cov-2" -v && ls -al "/tmp/sars-cov-2"'

Output:

Click to expand ``` [2022-06-07 06:01:47.322][nextclade][INFO ] packages_rs/nextclade-cli/src/io/http_client.rs:92: HTTP 'GET' request to 'https://data.master.clades.nextstrain.org/index_v2.json' [2022-06-07 06:01:47.558][nextclade][INFO ] packages_rs/nextclade-cli/src/cli/nextclade_dataset_get.rs:97: Searching for datasets having attributes: name='sars-cov-2', reference='default', tag='latest' [2022-06-07 06:01:47.559][nextclade][INFO ] packages_rs/nextclade-cli/src/io/http_client.rs:92: HTTP 'GET' request to 'https://data.master.clades.nextstrain.org/datasets/sars-cov-2/references/MN908947/versions/2022-06-03T12:00:00Z/files/sequences.fasta' [2022-06-07 06:01:47.559][nextclade][INFO ] packages_rs/nextclade-cli/src/io/http_client.rs:92: HTTP 'GET' request to 'https://data.master.clades.nextstrain.org/datasets/sars-cov-2/references/MN908947/versions/2022-06-03T12:00:00Z/files/genemap.gff' [2022-06-07 06:01:47.559][nextclade][INFO ] packages_rs/nextclade-cli/src/io/http_client.rs:92: HTTP 'GET' request to 'https://data.master.clades.nextstrain.org/datasets/sars-cov-2/references/MN908947/versions/2022-06-03T12:00:00Z/files/tree.json' [2022-06-07 06:01:47.559][nextclade][INFO ] packages_rs/nextclade-cli/src/io/http_client.rs:92: HTTP 'GET' request to 'https://data.master.clades.nextstrain.org/datasets/sars-cov-2/references/MN908947/versions/2022-06-03T12:00:00Z/files/tag.json' [2022-06-07 06:01:47.559][nextclade][INFO ] packages_rs/nextclade-cli/src/io/http_client.rs:92: HTTP 'GET' request to 'https://data.master.clades.nextstrain.org/datasets/sars-cov-2/references/MN908947/versions/2022-06-03T12:00:00Z/files/virus_properties.json' [2022-06-07 06:01:47.559][nextclade][INFO ] packages_rs/nextclade-cli/src/io/http_client.rs:92: HTTP 'GET' request to 'https://data.master.clades.nextstrain.org/datasets/sars-cov-2/references/MN908947/versions/2022-06-03T12:00:00Z/files/qc.json' [2022-06-07 06:01:47.559][nextclade][INFO ] packages_rs/nextclade-cli/src/io/http_client.rs:92: HTTP 'GET' request to 'https://data.master.clades.nextstrain.org/datasets/sars-cov-2/references/MN908947/versions/2022-06-03T12:00:00Z/files/reference.fasta' [2022-06-07 06:01:47.559][nextclade][INFO ] packages_rs/nextclade-cli/src/io/http_client.rs:92: HTTP 'GET' request to 'https://data.master.clades.nextstrain.org/datasets/sars-cov-2/references/MN908947/versions/2022-06-03T12:00:00Z/files/primers.csv' total 8964 drwxr-xr-x 2 root root 4096 Jun 7 06:01 . drwxrwxrwt 1 root root 4096 Jun 7 06:01 .. -rw-r--r-- 1 root root 826 Jun 7 06:01 genemap.gff -rw-r--r-- 1 root root 2280 Jun 7 06:01 primers.csv -rw-r--r-- 1 root root 7000 Jun 7 06:01 qc.json -rw-r--r-- 1 root root 30430 Jun 7 06:01 reference.fasta -rw-r--r-- 1 root root 3233191 Jun 7 06:01 sequences.fasta -rw-r--r-- 1 root root 900 Jun 7 06:01 tag.json -rw-r--r-- 1 root root 5849190 Jun 7 06:01 tree.json -rw-r--r-- 1 root root 26632 Jun 7 06:01 virus_properties.json ```

and this command:

docker run -it nextstrain/nextclade:2.0.0-alpha.5-debian bash -c 'nextclade dataset list --server="https://data.master.clades.nextstrain.org"'

Output:

Click to expand ``` Showing latest dataset(s) compatible with this version of Nextclade (2.0.0-alpha.5), having attributes: reference='default', tag='latest': +-----------------------+-----------------------+-----------------------+-----------------------+----------------------+ | name | reference | tag | attributes | comment | +======================================================================================================================+ | sars-cov-2 (*) | MN908947 (*) | 2022-06-03T12:00:00Z | name=sars-cov-2 (*) | Data update | | 'SARS-CoV-2' | 'Wuhan-Hu-1/2019' | (*) | reference=MN908947 | | | | | | (*) | | | | | | tag=2022-06-03T12:00: | | | | | | 00Z (*) | | |-----------------------+-----------------------+-----------------------+-----------------------+----------------------| | sars-cov-2-no-recomb | MN908947 (*) | 2022-04-28T12:00:00Z | name=sars-cov-2-no-re | New pango lineages | | 'SARS-CoV-2 without | 'Wuhan-Hu-1/2019' | (*) | comb | included, | | recombinants' | | | reference=MN908947 | pango-designation | | | | | (*) | release v1.8 | | | | | tag=2022-04-28T12:00: | | | | | | 00Z (*) | | |-----------------------+-----------------------+-----------------------+-----------------------+----------------------| | monkeypox | MT903344.1 (*) | 2022-05-20T12:00:00Z | name=monkeypox | First Monkeypox | | 'Monkeypox' | 'MPXV-UK_P2/2018' | (*) | reference=MT903344.1 | dataset, | | | | | (*) | experimental | | | | | tag=2022-05-20T12:00: | | | | | | 00Z (*) | | |-----------------------+-----------------------+-----------------------+-----------------------+----------------------| | flu_h1n1pdm_ha | CY121680 (*) | 2022-01-18T12:00:00Z | name=flu_h1n1pdm_ha | feat: enables | | 'Influenza A H1N1pdm | 'A/California/07/2009 | (*) | reference=CY121680 | reversion and | | HA' | ' | | (*) | labeled mutations | | | | | tag=2022-01-18T12:00: | highlighted | | | | | 00Z (*) | | |-----------------------+-----------------------+-----------------------+-----------------------+----------------------| | flu_h3n2_ha | CY163680 (*) | 2022-01-18T12:00:00Z | name=flu_h3n2_ha | Enables reversion | | 'Influenza A H3N2 HA' | 'A/Wisconsin/67/2005' | (*) | reference=CY163680 | and labeled mutation | | | | | (*) | highlighting | | | | | tag=2022-01-18T12:00: | | | | | | 00Z (*) | | |-----------------------+-----------------------+-----------------------+-----------------------+----------------------| | flu_vic_ha | KX058884 (*) | 2022-01-18T12:00:00Z | name=flu_vic_ha | Enables reversion | | 'Influenza B Victoria | 'B/Brisbane/60/2008' | (*) | reference=KX058884 | and labeled mutation | | HA' | | | (*) | highlighting | | | | | tag=2022-01-18T12:00: | | | | | | 00Z (*) | | |-----------------------+-----------------------+-----------------------+-----------------------+----------------------| | flu_yam_ha | JN993010 (*) | 2022-01-18T12:00:00Z | name=flu_yam_ha | Enables reversion | | 'Influenza B Yamagata | 'B/Wisconsin/01/2010' | (*) | reference=JN993010 | and labeled mutation | | HA' | | | (*) | highlighting | | | | | tag=2022-01-18T12:00: | | | | | | 00Z (*) | | +-----------------------+-----------------------+-----------------------+-----------------------+----------------------+ Asterisk (*) marks default values ```
ivan-aksamentov commented 2 years ago

Further improvements:

Later we may configure CI such that pre-releases download from the master server by default.

The default server is set using variable DATA_FULL_DOMAIN in .env file, for both, web app and CLI, in case you want to build from sources and modify it.

xzhub commented 2 years ago

Resolved by adding '--server' argument in 2.0.0-alpha.5, thanks!