nebiolabs / EM-seq

Tools and Data related to Enzymatic Methylation Sequencing
GNU Affero General Public License v3.0
16 stars 4 forks source link

error while running the em.seq.nf #10

Open MaryamLabaf opened 1 year ago

MaryamLabaf commented 1 year ago

Hi, I tried to run the pipeline for using the sample examples in AWS account and HPC cluster (activate the conda environment). But, I get the following error.

(nextflow) [ml@chimerahead EM-seq-master]$ ../nextflow run em-seq.nf --fastq_glob test_data/"*{1,2}.fastq" --genome methylation_controls.fa --flowcell "HCVHLDMXX" --cpus 8
N E X T F L O W  ~  version 23.04.2
Launching `em-seq.nf` [awesome_spence] DSL2 - revision: 72ddd5a529
Processing HCVHLDMXX... => output
ERROR ~ No such variable: md_bams

 -- Check script 'em-seq.nf' at line: 81 or see '.nextflow.log' file for more details

Thanks to help to figure out the error.

mattsoup commented 1 year ago

Launching em-seq.nf [awesome_spence] DSL2 - revision: 72ddd5a529

Seems your nextflow is running DSL2, but the em-seq.nf script is still written in DSL1, which I suspect is the problem. You may need to downgrade your nextflow version, as the most recent versions no longer support DSL1 scripts.

bwlang commented 1 year ago

One easy way to do this: NXF_VER=21.10.6 nextflow run ...

ghost commented 1 year ago

Thank you for the reply. Adding "NXF_VER=21.10.6" at the beginning of the new flow run fixed the previous error. But, I got a new error:

(nextflow) [ml@chimerahead EMseq_pipline]$ NXF_VER=21.10.6 nextflow run EM-seq-master/em-seq.nf --fastq_glob EM-seq-master/test_data/"*{1,2}.fastq" --genome EM-seq-master/grch38_core+bs_controls.fa --flowcell "HCVHLDMXX" --cpus 8
N E X T F L O W  ~  version 21.10.6
Launching `EM-seq-master/em-seq.nf` [cheesy_sinoussi] - revision: 8de6bd6bb3
Processing HCVHLDMXX... => output
executor >  local (2)
[87/ffcf38] process > mapping ([HCVHLDMXX, 200ng_em-seq.ds]) [ 50%] 1 of 2, failed: 1, retries: 1
[-        ] process > mergeAndMarkDuplicates                 -
[-        ] process > methylDackel_mbias                     -
executor >  local (2)
[87/ffcf38] process > mapping ([HCVHLDMXX, 200ng_em-seq.ds]) [100%] 2 of 2, failed: 2, retries: 1 ✘
[-        ] process > mergeAndMarkDuplicates                 -
[-        ] process > methylDackel_mbias                     -
[-        ] process > methylDackel_extract                   -
[-        ] process > select_human_reads                     -
[-        ] process > runFastQC                              -
[-        ] process > sum_nonconverted_reads                 -
[-        ] process > combine_nonconversion                  -
[-        ] process > samtools_flagstats                     -
[-        ] process > samtools_stats                         -
[-        ] process > picard_gc_bias                         -
[-        ] process > picard_stats                           -
[-        ] process > human_gc_bias                          -
[-        ] process > human_insert_size                      -
[-        ] process > goleft                                 -
[-        ] process > multiqc                                -
[-        ] process > combine_mbias_tsv                      -
[-        ] process > combine_mbias_svg                      -
[07/e6d31a] NOTE: Missing output file(s) `*_fastp.json` expected by process `mapping ([HCVHLDMXX, 200ng_em-seq.ds])` -- Execution is retried (1)
Error executing process > 'mapping ([HCVHLDMXX, 200ng_em-seq.ds])'

Caused by:
  Missing output file(s) `*_fastp.json` expected by process `mapping ([HCVHLDMXX, 200ng_em-seq.ds])`

Command executed:

  inst_name=$(zcat -f '/mathspace/data01/ml/EMseq_pipline/EM-seq-master/test_data/200ng_em-seq.ds.1.fastq' | head -n 1 | cut -f 1 -d ':' | sed 's/^@//')
  fastq_barcode=$(zcat -f '/mathspace/data01/ml/EMseq_pipline/EM-seq-master/test_data/200ng_em-seq.ds.1.fastq' | head -n 1 | sed -r 's/.*://')

  if [[ "${inst_name:0:2}" == 'A0' ]] || [[ "${inst_name:0:2}" == 'NS' ]] ||        [[ "${inst_name:0:2}" == 'NB' ]] || [[ "${inst_name:0:2}" == 'VH' ]] ; then
     trim_polyg='--trim_poly_g'
     echo '2-color instrument: poly-g trim mode on'
  else
     trim_polyg=''
  fi
  seqtk mergepe <(zcat -f "/mathspace/data01/ml/EMseq_pipline/EM-seq-master/test_data/200ng_em-seq.ds.1.fastq") <(zcat -f "/mathspace/data01/ml/EMseq_pipline/EM-seq-master/test_data/200ng_em-seq.ds.2.fastq")     | fastp --stdin --stdout -l 2 -Q ${trim_polyg} --interleaved_in --overrepresentation_analysis             -j "200ng_em-seq.ds_fastp.json" 2> fastp.stderr     | bwameth.py -p -t 16 --read-group "@RG\tID:${fastq_barcode}\tSM:200ng_em-seq.ds" --reference EM-seq-master/grch38_core+bs_controls.fa /dev/stdin                  2>  "200ng_em-seq.ds_${fastq_barcode}HCVHLDMXX_all_all.log.bwamem"     | mark-nonconverted-reads.py 2> "200ng_em-seq.ds_${fastq_barcode}_HCVHLDMXX_all_all.nonconverted.tsv"     | sambamba view -t 2 -S -f bam -o "200ng_em-seq.ds_${fastq_barcode}_HCVHLDMXX_all_all.aln.bam" /dev/stdin 2> sambamba.stderr;

Command exit status:
  0

Command output:
  2-color instrument: poly-g trim mode on

Work dir:
  /mathspace/data01/ml/EMseq_pipline/work/87/ffcf388b81436ef1fb639c80c5875e

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

The test data path is: /mathspace/data01/ml/EMseq_pipline/EM-seq-master/test_data/ The genome path is /mathspace/data01/ml/EMseq_pipline/EM-seq-master/grch38_core+bs_controls.fa

Thank you for any guide that can resolve the error.

bobermayer commented 11 months ago

I got the same error and in my case it looks like bwameth failed because I hadn't indexed the genome (__main__.BWAMethException: first run bwameth.py index ../index/grch38_core+bs_controls.fa), so I had to do that. I also gzip'ed the input fastq files because seqtk mergepe uses zcat.