nf-core / viralrecon

Assembly and intrahost/low-frequency variant calling for viral samples
https://nf-co.re/viralrecon
MIT License
111 stars 104 forks source link

Argument input-fasta is missing in NFCORE_VIRALRECON:ILLUMINA:CONSENSUS_BCFTOOLS:CONSENSUS_QC:NEXTCLADE_RUN" #398

Open CeliaRodrigues opened 8 months ago

CeliaRodrigues commented 8 months ago

Description of the bug

The pipeline was running correctly until it had to retrieve a consensus fasta file with Nextclade. The parameter seems to be missing and I do not know how to add it when running a Singularity image. I did install Nextclade with conda and I am calling the most recent version I could find in my input command. I can see the fasta file the pipeline is calling in the folders so that is not the issue, I would need to have --input-fasta or -i added before the file.

Command used and terminal output

# Input commmand
conda activate env_nf

sudo nextflow run nf-core/viralrecon/master \
    --input samplesheet.csv -c custom.config \
    --outdir OUTDIR \
    --platform illumina \
    --protocol amplicon \
    --genome 'MN908947.3' \
    --primer_bed primers.bed \
    --primer_fasta primers.fasta \
    --assemblers 'spades' --spades_modes corona --max_memory 15GB \
    -profile singularity \
    --nextclade_dataset_tag 2023-09-21T12:00:00Z --save_reference \
    --min_mapped_reads 30

# Output message
-[nf-core/viralrecon] Pipeline completed with errors-
ERROR ~ Error executing process > 'NFCORE_VIRALRECON:ILLUMINA:CONSENSUS_BCFTOOLS:CONSENSUS_QC:NEXTCLADE_RUN (20230424-011-TPB-1_S12_L001)'

Caused by:
  Process `NFCORE_VIRALRECON:ILLUMINA:CONSENSUS_BCFTOOLS:CONSENSUS_QC:NEXTCLADE_RUN (20230424-011-TPB-1_S12_L001)` terminated with an error exit status (1)

Command executed:

  nextclade \
      run \
       \
      --jobs 2 \
      --input-dataset nextclade_sars-cov-2_MN908947_2022-06-14T12_00_00Z \
      --output-all ./ \
      --output-basename 20230424-011-TPB-1_S12_L001 \
      20230424-011-TPB-1_S12_L001.consensus.fa

  cat <<-END_VERSIONS > versions.yml
  "NFCORE_VIRALRECON:ILLUMINA:CONSENSUS_BCFTOOLS:CONSENSUS_QC:NEXTCLADE_RUN":
      nextclade: $(echo $(nextclade --version 2>&1) | sed 's/^.*nextclade //; s/ .*$//')
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  Nextclade: Clade assignment, mutation calling, and sequence quality checks

  Usage: nextclade [options]
         nextclade completion

  Commands:
    nextclade completion  Generate shell autocompletion script

  Options:
    --help                    Show help  [boolean]
    --version                 Show version number  [boolean]
    --jobs, -j                Number of CPU threads used by the algorithm. If not specified, using number of logical CPU cores, as detected by Node.js runtime  [number] [default: 12]
    --input-fasta, -i         Path to a .fasta or a .txt file with input sequences  [string] [required]
    --input-root-seq, -r      Path to plain text file containing custom root sequence  [string]
    --input-tree, -a          (optional) Path to Auspice JSON v2 file containing custom reference tree. See https://nextstrain.org/docs/bioinformatics/data-formats  [string]
    --input-qc-config, -q     (optional) Path to a JSON file containing custom configuration of Quality Control rules.
                              For an example format see: https://github.com/nextstrain/nextclade/blob/20a9fda5b8046ce26669de2023770790c650daae/packages/web/src/algorithms/defaults/sars-cov-2/qcRulesConfig.ts  [string]
    --input-gene-map, -g      (optional) Path to a JSON file containing custom gene map. Gene map (sometimes also called "gene annotations") is used to resolve aminoacid changes in genes.
                              For an example see https://github.com/nextstrain/nextclade/blob/20a9fda5b8046ce26669de2023770790c650daae/packages/web/src/algorithms/defaults/sars-cov-2/geneMap.json  [string]
    --input-pcr-primers, -p   (optional) Path to a CSV file containing a list of custom PCR primer sites. These are used to report mutations in these sites.
                              For an example see https://github.com/nextstrain/nextclade/blob/20a9fda5b8046ce26669de2023770790c650daae/packages/web/src/algorithms/defaults/sars-cov-2/pcrPrimers.csv  [string]
    --output-json, -o         Path to output JSON results file  [string]
    --output-csv, -c          Path to output CSV results file  [string]
    --output-tsv-clades-only  Path to output CSV clades-only file  [string]
    --output-tsv, -t          Path to output TSV results file  [string]
    --output-tree, -T         Path to output Auspice JSON V2 results file. See https://nextstrain.org/docs/bioinformatics/data-formats  [string]

  Missing required argument: input-fasta

Work dir:
  /home/wastewater/work/48/61f8e691d05149aba7c8ef3ec6d238

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details

Relevant files

No response

System information

Nextflow version 23.04.4.5881 Container: Singularity 4.0.0 nf-core version 2.10 nf-core/viralrecon: Downloaded the master with nf-core download so it should be this latest release Environment was created with miniconda I am running it in Mobaxterm installed in a windows computer and I have the ubuntu wsl2 running.

CeliaRodrigues commented 8 months ago

I do see now in the genome folder that the Nextclade version that it downloaded is this one: nextclade_sars-cov-2_MN908947_2022-06-14T12_00_00Z so that is the one that it is using.

How can I make it use the most recent one? I thought that adding it to the command line, it would overwrite the version from the module.

CeliaRodrigues commented 8 months ago

Actually I went in the modules and added -i before $fasta in the main.nf files that are in the nextclade folder. That one was fixed and I got a different error: Error: at least one of output path arguments required: --output-json, --output-csv, --output-tsv-clades-only, --output-tsv, --output-tree So I will try to do the same for these and report back.

CeliaRodrigues commented 8 months ago

I added output lines in the main.nf from nextclade/run folder. Could you correct them because I am not sure I am using $prefix correctly. This is how that section is looking now: nextclade \ run \ $args \ --jobs $task.cpus \ --input-dataset $dataset \ --output-all ./ \ --output-json ./${prefix} \ --output-csv ./${prefix} \ --output-tsv ./${prefix} \ --output-tsv-clades-only ./${prefix} \ --output-tree ./${prefix} \ --output-all ./${prefix} \ --output-basename ${prefix} \ -i $fasta

svarona commented 1 month ago

Hi @CeliaRodrigues ! It's weird because for us is enough with this code to work:

nextclade \
    run \
     \
    --jobs 2 \
    --input-dataset nextclade_sars-cov-2_MN908947_2024-05-08--11-39-52Z \
    --output-all ./ \
    --output-basename SAMPLE3_SE \
    SAMPLE3_SE.consensus.fa

It does not seem to need all those params you're showing. Might it be related to the nextclade version? Which version are you using?

We use these to make nextclade always run in the latest version for SARS-CoV-2:

params.yml

nextclade_dataset_name: 'sars-cov-2'
nextclade_dataset: false
nextclade_dataset_tag: '2024-05-08--11-39-52Z'

custom.config:

withName: 'NEXTCLADE_DATASETGET|NEXTCLADE_RUN' {
container = 'https://depot.galaxyproject.org/singularity/nextclade:3.4.0--h9ee0642_0'
}

Are you using latest nextflow version and latest nextclade version? Let me know if you manage to solve the problem please 😄