nextstrain / ncov

Nextstrain build for novel coronavirus SARS-CoV-2
https://nextstrain.org/ncov
MIT License
1.35k stars 403 forks source link

doc: Workflow broken for non-Docker runtime following nextclade2/nextalign2 update #968

Closed victorlin closed 2 years ago

victorlin commented 2 years ago

Current Behavior

The Snakemake workflow now fails when not using the Docker runtime:

/bin/bash: line 1: nextalign2: command not found

and similar for nextclade2.

This is because the commands nextalign2/nextclade2 are only provided into the nextstrain/base Docker image after nextstrain/docker-base#54 and exclusively used by this workflow following merge of #963.

Expected behavior

Workflow should work for all runtimes.

How to reproduce

  1. Set up a native environment following installation docs (simpler method proposed by #967):

    mamba create -n nextstrain \
    -c conda-forge -c bioconda \
    nextstrain-cli augur auspice nextalign nextclade snakemake git epiweeks pangolin pangolearn \
    --yes
  2. Run example data tutorial.

  3. See error in logs/align_reference_data.txt

Possible solutions

cc @ivan-aksamentov

ivan-aksamentov commented 2 years ago

The concern was that someone may need Nextclade v1, and so the default symlink still points to v1. Or at lest there was concern before the release and the internal upgrade (we could not just break master branch by swapping nextclade in the image to v2)

I don't know much about where the base image is used and by whom, so I trust you folks to make a call here.

One thing I can tell for sure, v1 is gone, and we try to encourage switching to v2 for everyone who asks.

cc @tsibley @rneher

victorlin commented 2 years ago

For other tools such as Augur and Nextstrain CLI, I think we've just bumped major versions in the Docker image directly without keeping two versions around such as nextclade vs. nextclade2.

My suggestion is to drop nextclade2 in the Docker image and make nextclade point to v2. If we want to allow users to still have access to v1, maybe have a nextclade1 command available similar to how nextclade2 is currently available.

This would make the experience of Docker runtime users more similar to the experience of users running nextclade installed directly on their machine.

ivan-aksamentov commented 2 years ago

Yeah, I agree. And if someone still needs a base container with v1 they can simply go back at a previous docker tag.

ivan-aksamentov commented 2 years ago

I submitted PRs which should make the transition in places I know:

ncov-ingest is not affected, because it downloads fresh version from GitHub every time

What do you think?

Note how monkeypox readme advises to download and rename to nextclade2. This won't work with conda though, I don't think so.

We should probably consult with the team more broadly before committing to that.

And there is a certain amount of synchronization required between merges in the different repos.

victorlin commented 2 years ago

This discussion forum post was made because of this issue. Copying the error output here for visibility:

nextstrain build . --cores all --use-conda --configfile ncov-tutorial/example-data.yaml
(base) swyman@igi-biotite:~/projects/ncov/logs$ more align_reference_data.txt
/bin/bash: line 1: nextalign2: command not found
(base) swyman@igi-biotite:~/projects/ncov/logs$ more sanitize_sequences_reference_data.txt
Traceback (most recent call last):
  File "scripts/sanitize_sequences.py", line 130, in <module>
    write_sequences(sequence, output_handle)
  File "/home/swyman/miniconda3/envs/nextstrain/lib/python3.7/site-packages/augur/io/sequences.py", line 71, in write_sequences
    format
  File "/home/swyman/miniconda3/envs/nextstrain/lib/python3.7/site-packages/Bio/SeqIO/__init__.py", line 535, in write
    fp.write(format_function(record))
BrokenPipeError: [Errno 32] Broken pipe
huddlej commented 2 years ago

This issue is closed by @ivan-aksamentov's PRs listed above.