kundajelab / atac_dnase_pipelines

ATAC-seq and DNase-seq processing pipeline
BSD 3-Clause "New" or "Revised" License
160 stars 81 forks source link

dupmark not running? #78

Closed albertoriva closed 6 years ago

albertoriva commented 6 years ago

I'm running the atacqc pipeline on 3 paired-end fastq files, and I'm getting an error because samtools cannot find the dupmark.bam file. I'm looking in the log file to see what the problem could be, and this is the command that (if I'm not mistaken) should create this file:

| 14746 | running (RUNNING) | dedup_bam_PE_1 rep3 | | if [[ 0 > 0 ]]; then; samtools view -F 524 -f 2 -u /DATA/runs/miami/atacqcsplit.out/align/rep3/3-UMC25-Ctrl_R1.PE2SE.bam |; sambamba sort -t 5 -n /dev/stdin -o /DATA/runs/miami/atacqcsplit.out/align/rep3/3-UMC25-Ctrl_R1.PE2SE.dupmark.bam;; samtools view -h /DATA/runs/miami/atacqcsplit.out/align/rep3 |

I also see messages produced by picard MarkDuplicates in the log file, and it seems to run successfully, but indeed the dupmark.bam file is not there. How and where can I find more information to understand where the problem is?

Thank you!

leepc12 commented 6 years ago

Pipeline keeps dupmark log (.dup.qc) but .dupmark.bam is an intermediate file and it's deleted at the end of the filtering step. It's just okay that you cannot find it.

I think there is another reason for this error like lack of space on your output directory or /tmp? Can you check if you have enough space on output directory (./out) and $TMP and $TMPDIR(/tmp)?

Also, can you post a full error log here?

On Sun, Oct 29, 2017 at 1:59 PM, Alberto Riva notifications@github.com wrote:

I'm running the atacqc pipeline on 3 paired-end fastq files, and I'm getting an error because samtools cannot find the dupmark.bam file. I'm looking in the log file to see what the problem could be, and this is the command that (if I'm not mistaken) should create this file:

| 14746 | running (RUNNING) | dedup_bam_PE_1 rep3 | | if [[ 0 > 0 ]]; then; samtools view -F 524 -f 2 -u /DATA/runs/miami/atacqcsplit. out/align/rep3/3-UMC25-Ctrl_R1.PE2SE.bam |; sambamba sort -t 5 -n /dev/stdin -o /DATA/runs/miami/atacqcsplit.out/align/rep3/3-UMC25-Ctrl_R1.PE2SE.dupmark.bam;; samtools view -h /DATA/runs/miami/atacqcsplit.out/align/rep3 |

I also see messages produced by picard MarkDuplicates in the log file, and it seems to run successfully, but indeed the dupmark.bam file is not there. How and where can I find more information to understand where the problem is?

Thank you!

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kundajelab/atac_dnase_pipelines/issues/78, or mute the thread https://github.com/notifications/unsubscribe-auth/AIOd_IazCosWXa0tD_9ggLalkqe9Kwq4ks5sxOcbgaJpZM4QKeR6 .

albertoriva commented 6 years ago

I changed both $TMP and $TMPDIR to point to a directory on the same filesystem where the data are, and there should be plenty of space there, so I don't think that's the problem. What do you mean by a "full error log"? I've redirected the pipeline's output to a file, and the first indication that something is wrong is these lines:

[E::hts_open] fail to open file '/DATA/runs/miami/atacqcsplit.out/qc/rep3/3-UMC25-Ctrl_R1.PE2SE.dupmark.ataqc.bam' [E::hts_open] fail to open file '/DATA/runs/miami/atacqcsplit.out/qc/rep3/3-UMC25-Ctrl_R1.PE2SE.dupmark.ataqc.bam' samtools: failed to open "/DATA/runs/miami/atacqcsplit.out/qc/rep3/3-UMC25-Ctrl_R1.PE2SE.dupmark.ataqc.bam" for reading: No such file or directory

This is followed by a python traceback, and then by this block:

=================== Task failed: Program & line : '/DATA/apps/atac_dnase_pipelines/modules/ataqc.bds', line 109 Task Name : 'ataqc rep2' Task ID : 'atac.bds.20171028_182127_489_parallel_239/task.ataqc.ataqc_rep2.line_109.id_129' Task PID : '20100' Task hint : 'if [ "${TMPDIR}" != "" ] && [ -d "${TMPDIR}" ]; then; fi; cd /DATA/runs/miami/atacqcsplit.out/qc/rep2; /DATA/apps/atac_dnase_pipelines/ataqc/run_ataqc.py; --workdir /DATA/runs/miami/atacqcsplit.out/qc/rep2; --outdir /DATA/runs/miami/atacqcsplit.out/qc/rep2; --outprefix 2-UMC25-Ctrl_R1.PE2SE; --genom' Task resources : 'cpus: -1 mem: -1.0 B wall-timeout: 8640000' State : 'ERROR' Dependency state : 'ERROR' Retries available : '1' Input files : '[/DATA/runs/miami/2-UMC25-Ctrl_R1.fastq.gz, /DATA/runs/miami/2-UMC25-Ctrl_R2.fastq.gz, /DATA/runs/miami/atacqcsplit.out/align/rep2/2-UMC25-Ctrl_R1.PE2SE.bam, /DATA/runs/miami/atacqcsplit.out/qc/rep2/2-UMC25-Ctrl_R1.PE2SE.align.log, /DATA/runs/miami/atacqcsplit.out/qc/rep2/2-UMC25-Ctrl_R1.PE2SE.nodup.pbc.qc, /DATA/runs/miami/atacqcsplit.out/qc/rep2/2-UMC25-Ctrl_R1.PE2SE.dup.qc, /DATA/runs/miami/atacqcsplit.out/align/rep2/2-UMC25-Ctrl_R1.PE2SE.nodup.bam, /DATA/runs/miami/atacqcsplit.out/align/rep2/2-UMC25-Ctrl_R1.PE2SE.nodup.tn5.tagAlign.gz, /DATA/runs/miami/atacqcsplit.out/signal/macs2/rep2/2-UMC25-Ctrl_R1.PE2SE.nodup.tn5.pf.pval.signal.bigwig, /DATA/runs/miami/atacqcsplit.out/peak/macs2/overlap/1-UMC25-Ctrl_R1.PE2SE.nodup.tn5_pooled.pf.500K.pval0.01.naive_overlap.filt.narrowPeak.gz]' Output files : '[/DATA/runs/miami/atacqcsplit.out/qc/rep2/2-UMC25-Ctrl_R1.PE2SE_qc.html, /DATA/runs/miami/atacqcsplit.out/qc/rep2/2-UMC25-Ctrl_R1.PE2SE_qc.txt]' Script file : '/DATA/runs/miami/atac.bds.20171028_182127_489_parallel_239/task.ataqc.ataqc_rep2.line_109.id_129.sh' Exit status : '1'

===================

I can upload the entire file if necessary. Or should I look somewhere else?

Thank you!

leepc12 commented 6 years ago

Please send me (leepc12 at gmail com) a full error log. I need to look at it for debugging.

Thanks,

Jin

On Sun, Oct 29, 2017 at 3:50 PM, Alberto Riva notifications@github.com wrote:

I changed both $TMP and $TMPDIR to point to a directory on the same filesystem where the data are, and there should be plenty of space there, so I don't think that's the problem. What do you mean by a "full error log"? I've redirected the pipeline's output to a file, and the first indication that something is wrong is these lines:

[E::hts_open] fail to open file '/DATA/runs/miami/atacqcsplit. out/qc/rep3/3-UMC25-Ctrl_R1.PE2SE.dupmark.ataqc.bam' [E::hts_open] fail to open file '/DATA/runs/miami/atacqcsplit. out/qc/rep3/3-UMC25-Ctrl_R1.PE2SE.dupmark.ataqc.bam' samtools: failed to open "/DATA/runs/miami/atacqcsplit. out/qc/rep3/3-UMC25-Ctrl_R1.PE2SE.dupmark.ataqc.bam" for reading: No such file or directory

This is followed by a python traceback, and then by this block:

=================== Task failed: Program & line : '/DATA/apps/atac_dnase_pipelines/modules/ataqc.bds', line 109 Task Name : 'ataqc rep2' Task ID : 'atac.bds.20171028_182127_489_parallel239/task.ataqc.ataqc rep2.line_109.id_129' Task PID : '20100' Task hint : 'if [ "${TMPDIR}" != "" ] && [ -d "${TMPDIR}" ]; then; fi; cd /DATA/runs/miami/atacqcsplit.out/qc/rep2; /DATA/apps/atac_dnase_pipelines/ataqc/run_ataqc.py; --workdir /DATA/runs/miami/atacqcsplit.out/qc/rep2; --outdir /DATA/runs/miami/atacqcsplit.out/qc/rep2; --outprefix 2-UMC25-Ctrl_R1.PE2SE; --genom' Task resources : 'cpus: -1 mem: -1.0 B wall-timeout: 8640000' State : 'ERROR' Dependency state : 'ERROR' Retries available : '1' Input files : '[/DATA/runs/miami/2-UMC25-Ctrl_R1.fastq.gz, /DATA/runs/miami/2-UMC25-Ctrl_R2.fastq.gz, /DATA/runs/miami/atacqcsplit. out/align/rep2/2-UMC25-Ctrl_R1.PE2SE.bam, /DATA/runs/miami/atacqcsplit. out/qc/rep2/2-UMC25-Ctrl_R1.PE2SE.align.log, /DATA/runs/miami/atacqcsplit. out/qc/rep2/2-UMC25-Ctrl_R1.PE2SE.nodup.pbc.qc, /DATA/runs/miami/atacqcsplit.out/qc/rep2/2-UMC25-Ctrl_R1.PE2SE.dup.qc, /DATA/runs/miami/atacqcsplit.out/align/rep2/2-UMC25-CtrlR1.PE2SE.nodup.bam, /DATA/runs/miami/atacqcsplit.out/align/rep2/2-UMC25-Ctrl R1.PE2SE.nodup.tn5.tagAlign.gz, /DATA/runs/miami/atacqcsplit. out/signal/macs2/rep2/2-UMC25-Ctrl_R1.PE2SE.nodup.tn5.pf.pval.signal.bigwig, /DATA/runs/miami/atacqcsplit.out/peak/macs2/overlap/1- UMC25-Ctrl_R1.PE2SE.nodup.tn5pooled.pf.500K.pval0.01.naive overlap.filt.narrowPeak.gz]' Output files : '[/DATA/runs/miami/atacqcsplit.out/qc/rep2/2-UMC25-Ctrl_R1.PE2SE_qc.html, /DATA/runs/miami/atacqcsplit.out/qc/rep2/2-UMC25-Ctrl_R1.PE2SE_qc.txt]' Script file : '/DATA/runs/miami/atac.bds.20171028_182127_489_parallel_239/ task.ataqc.ataqc_rep2.line_109.id_129.sh' Exit status : '1'

===================

I can upload the entire file if necessary. Or should I look somewhere else?

Thank you!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kundajelab/atac_dnase_pipelines/issues/78#issuecomment-340309730, or mute the thread https://github.com/notifications/unsubscribe-auth/AIOd_LUGbeMo0hotwJKPYjtbwJjgbWyDks5sxQFOgaJpZM4QKeR6 .

leepc12 commented 6 years ago

closing this due to long inactivity