nebiolabs / EM-seq

Tools and Data related to Enzymatic Methylation Sequencing
GNU Affero General Public License v3.0
15 stars 3 forks source link

em-seq.nf Error: Missing output file(s) *_fastp.json #9

Open kythol opened 1 year ago

kythol commented 1 year ago

Hello,

Thank you for adding the usage examples. I am trying to run your em-seq.nf script and it halts on mapping due to a file missing. The command:

nextflow run ../em-seq.nf --fastq_glob "*{1,2}.fastq" --genome "grch38_core+bs_controls.fa" --flowcell "HCVHLDMXX"

And this is the error:

[ec/0fed11] NOTE: Missing output file(s) *_fastp.json expected by process mapping ([HCVHLDMXX, 200ng_em-seq.ds]) -- Execution is retried (1) Error executing process > 'mapping ([HCVHLDMXX, 200ng_em-seq.ds])'

Caused by: Missing output file(s) *_fastp.json expected by process mapping ([HCVHLDMXX, 200ng_em-seq.ds])

Command executed:

inst_name=$(zcat -f '/data/Software/EM-seq/test_data/200ng_em-seq.ds.1.fastq' | head -n 1 | cut -f 1 -d ':' | sed 's/^@//') fastq_barcode=$(zcat -f '/data/Software/EM-seq/test_data/200ng_em-seq.ds.1.fastq' | head -n 1 | sed -r 's/.*://')

if [[ "${inst_name:0:2}" == 'A0' ]] || [[ "${inst_name:0:2}" == 'NS' ]] || [[ "${inst_name:0:2}" == 'NB' ]] || [[ "${inst_name:0:2}" == 'VH' ]] ; then trim_polyg='--trim_poly_g' echo '2-color instrument: poly-g trim mode on' else trim_polyg='' fi seqtk mergepe <(zcat -f "/data/Software/EM-seq/test_data/200ng_em-seq.ds.1.fastq") <(zcat -f "/data/Software/EM-seq/test_data/200ng_em-seq.ds.2.fastq") | fastp --stdin --stdout -l 2 -Q ${trim_polyg} --interleaved_in --overrepresentation_analysis -j "200ng_em-seq.ds_fastp.json" 2> fastp.stderr | bwameth.py -p -t 16 --read-group "@RG\tID:${fastq_barcode}\tSM:200ng_em-seq.ds" --reference grch38_core+bs_controls.fa /dev/stdin 2> "200ngem-seq.ds${fastq_barcode}HCVHLDMXX_all_all.log.bwamem" | mark-nonconverted-reads.py 2> "200ngem-seq.ds${fastq_barcode}_HCVHLDMXX_all_all.nonconverted.tsv" | sambamba view -t 2 -S -f bam -o "200ngem-seq.ds${fastq_barcode}_HCVHLDMXX_all_all.aln.bam" /dev/stdin 2> sambamba.stderr;

Command exit status: 0

Command output: 2-color instrument: poly-g trim mode on

Work dir: /data/Software/EM-seq/test_data/work/ac/6d2164b3edf32d955944527f8c2c06

Tip: view the complete command output by changing to the process work dir and entering the command cat .command.out

——— Also posting what the command.out file says: cat /data/Software/EM-seq/work/88/82b983bc0a37d6f108d8197bf0a03b/.command.out 2-color instrument: poly-g trim mode on

Thank you, Lisa

bwlang commented 1 year ago

HI Lisa: i wonder if your input data and genome files are present and named as expected... it looks to me like fastp did not get any input data to process.

kythol commented 1 year ago

Thanks for reaching out. All three files are present in the folder where I start the script, and the tool seem to be detecting the fastq files (mapping ([HCVHLDMXX, 200ng_em-seq.ds])). Here is ls command just in case:

ls -all test_data/ -rw-r--r-- 1 root root 103490 Apr 25 04:01 200ng_em-seq.ds.1.fastq -rw-r--r-- 1 root root 103490 Apr 25 04:01 200ng_em-seq.ds.2.fastq -rw-r--r-- 1 root root 3144519036 Nov 26 2019 grch38_core+bs_controls.fa

I also tried putting them into the main folder where the em-seq.nf script is located (was running from test_data), but got the same error.