replikation / poreCov

SARS-CoV-2 workflow for nanopore sequence data
https://case-group.github.io/
GNU General Public License v3.0
39 stars 16 forks source link

summary_report.py fails #241

Closed MarieLataretu closed 1 year ago

MarieLataretu commented 1 year ago

summary_report.py in create_summary_report_wf:summary_report still fails with --update in the master branch.

We encountered the error that is fixed by https://github.com/replikation/poreCov/commit/f52e3796e9233d25e73134d9139162316bad459e with the latest release -r 1.7.1. So we tried the master branch, but summary_report.py still fails with another error:

LOG: Started summary_report.py ...
usage: summary_report.py [-h] -v VERSION_CONFIG --variants_table
                         VARIANTS_TABLE --porecov_version PORECOV_VERSION
                         [--guppy_used GUPPY_USED] [--guppy_model GUPPY_MODEL]
                         [--medaka_model MEDAKA_MODEL] --nf_commandline
                         NF_COMMANDLINE --nextclade_docker NEXTCLADE_DOCKER
                         [--primer PRIMER] [-p PANGOLIN_RESULTS]
                         [-n NEXTCLADE_RESULTS] [-q PRESIDENT_RESULTS]
                         [-k KRAKEN2_RESULTS] [-c COVERAGE_PLOTS] [-s SAMPLES]
summary_report.py: error: unrecognized arguments: --scorpio_version scorpio 0.3.16 --scorpio_constellations_version constellations v0.1.3 --pangolin_docker nanozoo/pangolin:3.1.20--2022-02-28

from .command.sh:

#!/bin/bash -ue
echo 'sample,num_unclassified,num_sarscov2,num_human' > kraken2_results.csv
for KF in [xxx]; do
NUNCLASS=$(awk -v ORS= '$5=="0" {print $3}' $KF)
NSARS=$(awk -v ORS= '$5=="2697049" {print $3}' $KF)
NHUM=$(awk '$5=="9606" {print $3}' $KF)
echo "${KF%.kreport},${NUNCLASS:-0},${NSARS:-0},${NHUM:-0}" >> kraken2_results.csv
done

summary_report.py             -v container.config             --scorpio_version "scorpio 0.3.16"             --scorpio_constellations_version "constellations v0.1.3"             --variants_table SARSCoV2_variants_2022-10-19--10-29-55.csv             --porecov_version master:beab36afe9697a42cc7ee674e1cd3615e64c69e7:25b9ab6abb0ace4768d248c113e31f61             --nextclade_docker nanozoo/nextclade:1.11.0--2022-06-14             --guppy_used false             --guppy_model dna_r9.4.1_450bps_sup.cfg             --medaka_model r941_min_sup_g507             --nf_commandline 'nextflow run replikation/poreCov -r master -profile slurm,singularity -w work --update --fastq_pass fastq_pass --samples Run22-356_samplesheet.csv --output results --cachedir [xxx]/poreCov/ --krakendb [xxx]/common/databases/kraken2/GRCh38.p13_SC2_2022-03-01.tar.gz --one_end false --medaka_model r941_min_sup_g507 --guppy_model dna_r9.4.1_450bps_sup.cfg --primerV V4.1 --screen_reads -resume'             --pangolin_docker nanozoo/pangolin:3.1.20--2022-02-28             --primer V4.1             -p pangolin_results.csv             -q president_results.tsv             -n nextclade_results.tsv             -k kraken2_results.csv             -c $(echo [XXX] | tr ' ' ',')             -s input.csv

Command used

nextflow run replikation/poreCov -r 1.7.1 -profile slurm,singularity -w work --update --fastq_pass fastq_pass --samples Run22-356_samplesheet.csv --output results --cachedir [xxx]/poreCov/ --krakendb [xxx]/common/databases/kraken2/GRCh38.p13_SC2_2022-03-01.tar.gz --one_end false --medaka_model r941_min_sup_g507 --guppy_model dna_r9.4.1_450bps_sup.cfg --primerV V4.1 --screen_reads

# then
nextflow run replikation/poreCov -r master -profile slurm,singularity -w work --update --fastq_pass fastq_pass --samples Run22-356_samplesheet.csv --output results --cachedir [xxx]/poreCov/ --krakendb [xxx]/common/databases/kraken2/GRCh38.p13_SC2_2022-03-01.tar.gz --one_end false --medaka_model r941_min_sup_g507 --guppy_model dna_r9.4.1_450bps_sup.cfg --primerV V4.1 --screen_reads -resume
DataSpott commented 1 year ago

For the moment I can only say that in the .command.sh a flag [--pangolin_docker] with an input appears, that is from code-side not supposed to be there (after [--nf_commandline] & before [--primer]-flag). But need to dig a bit deeper to get a clue why this unexpected flag appears there.

replikation commented 1 year ago

there seems to be something fundamentally wrong with the code you are using that is not reflected in the code base of the most recent poreCov version @MarieLataretu.

e.g. "--pangolin_docker" is a flag that is nowhere to be found in the "summary_report.py" (see here) and neither is it called via the nextflow process (see here).

can you maybe do a rm -rf ~/.nextflow so it removes the nextflow managed git, so it will be freshly repulled? because non of our code is calling this flag to cause such an error. we are used to this in a way older version of poreCov.

replikation commented 1 year ago

another option would be to remove the singularity image raverjay/fastcov:0.1.3--ba8c8cf6ae19 so its rebuilt. could be an issue but I don't see how.

replikation commented 1 year ago

also please copy past the full nextflow terminal print. as it shows which "commit id" the nextflow run is using.

replikation commented 1 year ago

the python call command looks btw like this:

      summary_report.py \
          -v !{version_config} \
          --variants_table !{variants_table} \
          --porecov_version !{workflow.revision}:!{workflow.commitId}:!{workflow.scriptId} \
          --nextclade_docker !{params.nextcladedocker} \
          --guppy_used !{guppyused} \
          --guppy_model !{params.guppy_model} \
          --medaka_model !{params.medaka_model} \
          --nf_commandline '!{workflow.commandLine}' \
          --primer !{params.primerV} \
          -p !{pangolin_results} \
          -q !{president_results} \
          -n !{nextclade_results} \
          -k kraken2_results.csv \
          -c $(echo !{coverage_plots} | tr ' ' ',') \
          -s !{samples_table}

so it looks different from your call:

summary_report.py            
 -v container.config           
  --scorpio_version "scorpio 0.3.16"             
--scorpio_constellations_version "constellations v0.1.3"             
--variants_table SARSCoV2_variants_2022-10-19--10-29-55.csv             
--porecov_version master:beab36afe9697a42cc7ee674e1cd3615e64c69e7:25b9ab6abb0ace4768d248c113e31f61             --nextclade_docker nanozoo/nextclade:1.11.0--2022-06-14             
--guppy_used false             
--guppy_model dna_r9.4.1_450bps_sup.cfg             
--medaka_model r941_min_sup_g507             
--nf_commandline 'nextflow run replikation/poreCov -r master -profile slurm,singularity -w work --update --fastq_pass fastq_pass --samples Run22-356_samplesheet.csv --output results --cachedir [xxx]/poreCov/ --krakendb [xxx]/common/databases/kraken2/GRCh38.p13_SC2_2022-03-01.tar.gz --one_end false --medaka_model r941_min_sup_g507 --guppy_model dna_r9.4.1_450bps_sup.cfg --primerV V4.1 --screen_reads -resume'             
--pangolin_docker nanozoo/pangolin:3.1.20--2022-02-28             
--primer V4.1             
-p pangolin_results.csv             
-q president_results.tsv             
-n nextclade_results.tsv             
-k kraken2_results.csv             
-c $(echo [XXX] | tr ' ' ',')             
-s input.csv
MarieLataretu commented 1 year ago

Ach, I think I found the problem: nextflow pull replikation/poreCov and nextflow run -r master ... doesn't get you the latest master version

But thanks for the error introspection The commit id was in the summary_report call ;) master:beab36afe9697a42cc7ee674e1cd3615e64c69e7:25b9ab6abb0ace4768d248c113e31f61

MarieLataretu commented 1 year ago

jup, solved the problem.

A new release with the bug fix would be nice then!