nextstrain / ncov

Nextstrain build for novel coronavirus SARS-CoV-2
https://nextstrain.org/ncov
MIT License
1.35k stars 403 forks source link

/bin/bash: nextclade2: command not found #1110

Closed llk578496 closed 3 months ago

llk578496 commented 3 months ago

Current Behavior Hello! We would like to use Nextstrain to perform the phylogenetic analysis for the SARS-CoV-2 genomes. We have been following the tutorial on SARS-CoV-2 Workflow. We have finished the Setup and installation section. However, when we tried to follow the instructions from Run using example data to have a test run using the provided example data, we got the error as below:

(base) gilman_siu2@gilmansiu2-Z490-VISION-D:/mnt/data6/COVID/phylogenetic_trees/20230213_latest_local_paper_2022_whole_year/nextstrain-cli-v8.2.0/example-data/ncov$ nextstrain build . --configfile ncov-tutorial/example-data.yaml Building DAG of jobs... Using shell: /bin/bash Provided cores: 20 Rules claiming more threads will be scaled down. Job counts: count jobs 1 adjust_metadata_regions 1 align 1 all 1 ancestral 1 annotate_metadata_with_index 1 build_align 1 build_description 1 calculate_epiweeks 1 clade_files 1 clades 1 colors 1 combine_samples 1 diagnostic 1 distances 1 emerging_lineages 1 export 1 filter 1 finalize 1 include_hcov19_prefix 1 index 1 join_metadata_and_nextclade_qc 1 logistic_growth 1 mask 1 mutational_fitness 1 prepare_nextclade 1 recency 1 refine 1 sanitize_metadata 1 subsample 1 tip_frequencies 1 traits 1 translate 1 tree 33

[Thu May 30 04:46:15 2024] rule clade_files: input: defaults/clades.tsv output: results/default-build/clades.tsv jobid: 22 benchmark: benchmarks/clade_files_default-build.txt wildcards: build_name=default-build

    python3 scripts/rename_clades.py --input-clade-files defaults/clades.tsv             --name-mapping defaults/clade_display_names.yml             --output-clades results/default-build/clades.tsv

[Thu May 30 04:46:15 2024] Job 27: Downloading reference files for nextclade (used for alignment and qc).

    nextclade2 --version
    nextclade2 dataset get --name sars-cov-2 --output-zip data/sars-cov-2-nextclade-defaults.zip

[Thu May 30 04:46:15 2024] rule sanitize_metadata: input: data.nextstrain.org/files/ncov/open/reference/metadata.tsv.xz output: results/sanitized_metadata_reference_data.tsv.xz log: logs/sanitize_metadata_reference_data.txt jobid: 31 benchmark: benchmarks/sanitize_metadata_reference_data.txt wildcards: origin=reference_data resources: mem_mb=2000

/bin/bash: nextclade2: command not found

    python3 scripts/sanitize_metadata.py             --metadata data.nextstrain.org/files/ncov/open/reference/metadata.tsv.xz             --metadata-id-columns strain name 'Virus name'             --database-id-columns 'Accession ID' gisaid_epi_isl genbank_accession             --parse-location-field Location             --rename-fields 'Virus name=strain' Type=type 'Accession ID=gisaid_epi_isl' 'Collection date=date' 'Additional location information=additional_location_information' 'Sequence length=length' Host=host 'Patient age=patient_age' Gender=sex Clade=GISAID_clade 'Pango lineage=pango_lineage' pangolin_lineage=pango_lineage Lineage=pango_lineage 'Pangolin version=pangolin_version' Variant=variant 'AA Substitutions=aaSubstitutions' 'Submission date=date_submitted' 'Is reference?=is_reference' 'Is complete?=is_complete' 'Is high coverage?=is_high_coverage' 'Is low coverage?=is_low_coverage' N-Content=n_content GC-Content=gc_content             --strip-prefixes hCoV-19/ SARS-CoV-2/                          --output results/sanitized_metadata_reference_data.tsv.xz 2>&1 | tee logs/sanitize_metadata_reference_data.txt

[Thu May 30 04:46:15 2024] Error in rule prepare_nextclade: jobid: 27 output: data/sars-cov-2-nextclade-defaults.zip shell:

    nextclade2 --version
    nextclade2 dataset get --name sars-cov-2 --output-zip data/sars-cov-2-nextclade-defaults.zip

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

[Thu May 30 04:46:15 2024] Finished job 22. 1 of 33 steps (3%) done Downloading from remote: data.nextstrain.org/files/ncov/open/reference/metadata.tsv.xz Finished download.

[Thu May 30 04:46:16 2024] Job 30: Aligning sequences to defaults/reference_seq.fasta

  • gaps relative to reference are considered real

    python3 scripts/sanitize_sequences.py             --sequences data.nextstrain.org/files/ncov/open/reference/sequences.fasta.xz             --strip-prefixes hCoV-19/ SARS-CoV-2/             --output /dev/stdout 2> logs/sanitize_sequences_reference_data.txt             | nextalign run             --jobs=8             --reference defaults/reference_seq.fasta             --genemap defaults/annotation.gff             --output-translations results/translations/seqs_reference_data.gene.{gene}.fasta             --output-fasta results/aligned_reference_data.fasta             --output-insertions results/insertions_reference_data.tsv > logs/align_reference_data.txt 2>&1;
    xz -2 -T 8 results/aligned_reference_data.fasta;
    xz -2 -T 8 results/translations/seqs_reference_data.gene.*.fasta

Downloading from remote: data.nextstrain.org/files/ncov/open/reference/sequences.fasta.xz [Thu May 30 04:46:17 2024] Finished job 31. 2 of 33 steps (6%) done Finished download.

[Thu May 30 04:46:17 2024] Job 18: Templating build description for Auspice

Job counts: count jobs 1 build_description 1 [Thu May 30 04:46:18 2024] Finished job 18. 3 of 33 steps (9%) done [Thu May 30 04:46:18 2024] Error in rule align: jobid: 30 output: results/aligned_reference_data.fasta.xz, results/insertions_reference_data.tsv, results/translations/seqs_reference_data.gene.ORF1a.fasta.xz, results/translations/seqs_reference_data.gene.ORF1b.fasta.xz, results/translations/seqs_reference_data.gene.S.fasta.xz, results/translations/seqs_reference_data.gene.ORF3a.fasta.xz, results/translations/seqs_reference_data.gene.E.fasta.xz, results/translations/seqs_reference_data.gene.M.fasta.xz, results/translations/seqs_reference_data.gene.ORF6.fasta.xz, results/translations/seqs_reference_data.gene.ORF7a.fasta.xz, results/translations/seqs_reference_data.gene.ORF7b.fasta.xz, results/translations/seqs_reference_data.gene.ORF8.fasta.xz, results/translations/seqs_reference_data.gene.N.fasta.xz, results/translations/seqs_reference_data.gene.ORF9b.fasta.xz log: logs/align_reference_data.txt (check log file(s) for error message) shell:

    python3 scripts/sanitize_sequences.py             --sequences data.nextstrain.org/files/ncov/open/reference/sequences.fasta.xz             --strip-prefixes hCoV-19/ SARS-CoV-2/             --output /dev/stdout 2> logs/sanitize_sequences_reference_data.txt             | nextalign run             --jobs=8             --reference defaults/reference_seq.fasta             --genemap defaults/annotation.gff             --output-translations results/translations/seqs_reference_data.gene.{gene}.fasta             --output-fasta results/aligned_reference_data.fasta             --output-insertions results/insertions_reference_data.tsv > logs/align_reference_data.txt 2>&1;
    xz -2 -T 8 results/aligned_reference_data.fasta;
    xz -2 -T 8 results/translations/seqs_reference_data.gene.*.fasta

    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job align since they might be corrupted: data.nextstrain.org/files/ncov/open/reference/sequences.fasta.xz Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /nextstrain/build/.snakemake/log/2024-05-30T044613.535304.snakemake.log

Expected behavior The flow should complete with no errors for the example data.

How to reproduce Steps to reproduce the current behavior:

Possible solution

Your environment: if browsing Nextstrain online

Your environment: if running Nextstrain locally

Additional context

Thanks a lot.

Best regards, Eddie

joverlee521 commented 3 months ago

Hello @llk578496,

Looks like the runtime you are using for the build does not include the nextclade2 command.

Could you run

nextstrain version --verbose

and attach the output so we can check which your runtime?

llk578496 commented 3 months ago

Hello @joverlee521,

Please find the output as below.

(base) gilman_siu2@gilmansiu2-Z490-VISION-D:/mnt/data6/COVID/phylogenetic_trees/20230213_latest_local_paper_2022_whole_year/nextstrain-cli-v8.2.0/example-data/ncov$ nextstrain version --verbose nextstrain.cli 8.4.0

Python /home/gilman_siu2/.nextstrain/cli-standalone/nextstrain 3.10.9 (main, Dec 21 2022, 04:02:04) [Clang 14.0.3 ]

Runners docker (default) nextstrain/base:build-20220523T233129Z (d34d7eab0283, 2022-05-24 07:45:58 +0800 HKT) augur 15.0.2 auspice v2.37.1 fauna d7e8eb2 sacra not present

conda nextstrain-base unknown

singularity docker://nextstrain/base (not present)

ambient unknown

aws-batch unknown

joverlee521 commented 3 months ago

Thanks @llk578496! I see that the docker image that you currently using (nextstrain/base:build-20220523T233129Z) is an older version that does not include the nextclade2 command.

You can update your runtime to the latest available version by running:

nextstrain update
llk578496 commented 3 months ago

It works now! Thank you very much!