nextstrain / nextclade

Viral genome alignment, mutation calling, clade assignment, quality checks and phylogenetic placement
https://clades.nextstrain.org
MIT License
211 stars 58 forks source link

BUG/UX: `[ERROR] Nextclade: Error: File not found: ""` when neither specifying `--input-dataset` nor `input-virus-properties` #703

Closed corneliusroemer closed 2 years ago

corneliusroemer commented 2 years ago

First encountered by @sidneymbell (so impacting actual users)

If you update Nextclade to v1.10.0 and run nextclade without the --input-dataset flag (which worked in the past) and don't explicitly add --input-virus-properties you get a relatively unspecific error:

[ERROR] Nextclade: Error: File not found: "". Please verify correctness of input flags. If a dataset is used, check that the dataset is not corrupted and is compatible with this version of Nextclade CLI.

Repro:

nextclade run \
                  --input-fasta dataset_old/sequences.fasta \
                  --reference dataset_old/reference.fasta \
                  --input-gene-map dataset_old/genemap.gff \
                  --input-tree dataset_old/tree.json \
                  --input-qc-config dataset_old/qc.json \
                  --input-pcr-primers nextclade_dataset/primers.csv --output-dir . \
                  --output-fasta out.fa

When preparing for v1.10.0 we thought about people forgetting to update their datasets, but we didn't think about people who do not use --input-dataset and thus will have to manually add --input-virus-properties.

Since this issue will likely confuse quite a few users, I suggest we make the error more informative, mentioning the file missing explicitly and/or suggesting to use --input-dataset.

In addition, we could make the somewhat breaking change clearer in the changelog to make it easier for people to find out they have to add --input-virus-properties.

I quickly searched Github to check whether people do not use --input-dataset and there's quite a good number of repos. @fritzo has struggled with this, @dpark01 figured it out in the end, and there are more examples, so I think this is quite high priority.

Example code: https://github.com/epi2me-labs/wf-artic/blob/master/main.nf#L354 https://github.com/broadinstitute/viral-pipelines/commit/97fd3397479366d7867a8fed6ee1a8cb3e0fa157#diff-3afbf685570ea9a80c1e5eae2b71b260d1c9d27bea5d31138224736974887704 https://github.com/broadinstitute/pyro-cov/commit/5a0abe803539343f8cfa0e17742f12c2177dbbf4 https://github.com/theiagen/public_health_viral_genomics/blob/main/tasks/task_taxonID.wdl

ivan-aksamentov commented 2 years ago

@corneliusroemer I forgot to add a check. Here is the fix: https://github.com/nextstrain/nextclade/pull/704. Feel free to tweak the message string.

ivan-aksamentov commented 2 years ago

Released the fix in 1.10.1