nf-core / eager

A fully reproducible and state-of-the-art ancient DNA analysis pipeline
https://nf-co.re/eager
MIT License
148 stars 82 forks source link

Wrong Flag or Flag typos do not throw an error, and erroneously runs through #153

Closed jfy133 closed 5 years ago

jfy133 commented 5 years ago

Describe the bug No error message when 'unknown' or typo flag is provided. The pipeline runs through 'ignoring' that flag which will produce incorrect output (or at least unexpected).

To Reproduce I just tried running the same command on -r 2.0.5 and -r dev,

However, when resubmitting with -dev keep the old --complexity_filter flag (where -r dev is now --complexity_filter_polyG).

nextflow run nf-core/eager \
-c /projects1/users/fellows/nextflow/nextflow_custom.config \
-profile shh_custom \
--reads '/projects1/users/fellows/nextflow/eager2/dev_testing/data/{ABM006.A0101,DJA002.A0101}/*_R{1,2}*fastq.gz' \
--pairedEnd \
--fasta '/projects1/users/fellows/nextflow/eager2/dev_testing/references/tannerella_forsythia_actual/Tannerella_forsythia_9212.fa' \
--outdir '/projects1/users/fellows/nextflow/eager2/dev_testing/eager2/01-actual_reference/tannerella_forsythia/Multiple' \
--name 'eager_nextflow_dev' \
--email fellows@shh.mpg.de \
--max_cpus 4 \
--max_mem '32.GB' \
--complexity_filter \
--skip_preseq \
--min_adap_overlap 1 \
--clip_readlength 30 \
--clip_min_read_quality 20 \
--bwaalnn 0.01 \
--bwalnl 32 \
--bam_discard_unmapped \
--bam_unmapped_type discard \
--dedupper dedup \
--dedup_all_merged \
-r 2.0.5 \
-resume

``

``
nextflow run nf-core/eager \
-c /projects1/users/fellows/nextflow/nextflow_custom.config \
-profile shh_custom \
--reads '/projects1/users/fellows/nextflow/eager2/dev_testing/data/{ABM006.A0101,DJA002.A0101}/*_R{1,2}*fastq.gz' \
--pairedEnd \
--fasta '/projects1/users/fellows/nextflow/eager2/dev_testing/references/tannerella_forsythia_actual/Tannerella_forsythia_9212.fa' \
--outdir '/projects1/users/fellows/nextflow/eager2/dev_testing/eager2/01-actual_reference/tannerella_forsythia/Multiple' \
--name 'eager_nextflow_dev' \
--email fellows@shh.mpg.de \
--max_cpus 4 \
--max_mem '32.GB' \
--complexity_filter \
--skip_preseq \
--min_adap_overlap 1 \
--clip_readlength 30 \
--clip_min_read_quality 20 \
--bwaalnn 0.01 \
--bwalnl 32 \
--bam_discard_unmapped \
--bam_unmapped_type discard \
--dedupper dedup \
--dedup_all_merged \
-r dev \
-resume

Expected behavior

If an unknown flag is provided, the pipeline should stop and throw an error.

apeltzer commented 5 years ago

I don't think this is possible at all. Nextflow provides the possibility to use arbitrary command line parameters in your code, e.g. everything that is supplied with double dashes --bla will be params.bla in your pipeline code. This is almost certainly impossible to tackle, though I'll ask around at the Nextflow gitter....

jfy133 commented 5 years ago

So how does it work if you have a typo in your flag, for example? It just runs through anyway ignoring that setting?

apeltzer commented 5 years ago

If you have code that checks that flag specifically (e.g. you have something in your main.nf that checks

if (bla) {
do_something
} else {
do_something_else
}

it will fail. If you have some typo, it will just pretend you didn't set bla and raise an error or use the default value for bla.

apeltzer commented 5 years ago

Not available at this time says Paolo

jfy133 commented 5 years ago

That seems extremely dangerous though? I'm very surprised that is the case.

Could we not add something like an if statement that you said?

I'm thinking more from my knowledge of R, but assuming somehow all input params.* are stored in a single place somewhere (like an array or vector), as input_params, and acceptable params in another one called acceptable_params :

for (input_params in params.*){
  if ( !input_param %in% acceptable_params ){
     cat(paste("Error:", input_param, "is not valid - please check"))
     quit()
  }
} 

or bash

for i in ${input_params[@]}; do
  if [[ "${input_params[*]}" != "$i" ]]; then
    print "Error " input_params "incorrect. Please check"
 fi
done

Could we not add something like that ourselves? Or is it not possible to store params in this way?

This is a very basic program functionality and if it just runs through with no warning is insane as in many of our cases something can run through and look 'normal' even if a setting is not on correctly.

apeltzer commented 5 years ago

If there are statements in the code that we definitely need to have set to a specific value, we need to ensure that this is the case by checking these explicitly. Iterating over the params.* isn't possible IIRC, we might be able to open an issue over at nextflow and propose a new feature for that....

jfy133 commented 5 years ago

I think that would be worth it. Like I said, this is very dangerous particularly for people who are not familiar with NGS or aDNA (so don't know what they are expecting to look). As aDNA is a niche field we will often have people like that...

I wonder why that is not already integrated somehow with nextflow. It's basic command-line tool sanity checking I thought...

Do you mind doing posting the nextflow issue, or would you rather me do this?

apeltzer commented 5 years ago

I think you can just describe it over at the nextflow-io/nextflow issue tracking system 👍