Closed jonasungerback closed 6 years ago
Maybe it is something wrong with my installation but I have re-installed it with dependencies but I am also getting an error when I am using the -no_dup_removal option whether I start with a fastq-file or a raw bam-file. Any idea how I can get it to run with the -no_dup_removal option activated.
My full command was:
atac.bds -species hg19 -enable_idr -auto_detect_adapter -no_dup_removal -rm_chr_from_tag mito -multimapping 4 -nth 8 -se -fastq1 file1.fastq.gz -fastq2 file2.fastq.gz -fastq3 file3.fastq.gz
`shuf: '/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out/align/rep2/NALM6_index_3_trimmed.filt.tn5.tagAlign.gz': end of file Task failed: Program & line : '/mnt/data/bioinfo_tools_and_refs/bioinfo_tools/kundajelab_ENCODE_atac_dnase_pipeline/atac_dnase_pipelines/modules/postalign_bed.bds', line 152 Task Name : 'spr rep2' Task ID : 'atac.bds.20180109_114003_949_parallel_45/task.postalign_bed.spr_rep2.line_152.id_21' Task PID : '32873' Task hint : 'if [[ -f $(which conda) && $(conda env list | grep bds_atac | wc -l) != "0" ]]; then source activate bds_atac; sleep 5; fi; export PATH=/mnt/data/bio' Task resources : 'cpus: -1 mem: -1.0 B wall-timeout: 8640000' State : 'ERROR' Dependency state : 'ERROR' Retries available : '1' Input files : '[/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out/align/rep2/NALM6_index_3_trimmed.filt.tn5.tagAlign.gz]' Output files : '[/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out/align/pseudo_reps/rep2/pr1/NALM6_index_3_trimmed.filt.tn5.pr1.tagAlign.gz, /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out/align/pseudo_reps/rep2/pr2/NALM6_index_3_trimmed.filt.tn5.pr2.tagAlign.gz]' Script file : '/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/atac.bds.20180109_114003_949_parallel_45/task.postalign_bed.spr_rep2.line_152.id_21.sh' Exit status : '1' StdErr (10 lines) : shuf: '/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out/align/rep2/NALM6_index_3_trimmed.filt.tn5.tagAlign.gz': end of file
Fatal error: /mnt/data/bioinfo_tools_and_refs/bioinfo_tools/kundajelab_ENCODE_atac_dnase_pipeline/atac_dnase_pipelines/atac.bds, line 662, pos 4. Task/s failed`
Please run the following for more debugging info:
$ OUT_DIR=/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out
$ ls -l $OUT_DIR/align/rep*
$ ls -l $OUT_DIR/qc/rep*
Thanks,
Jin
https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=icon Virus-free. www.avast.com https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail&utm_term=link <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
On Tue, Jan 9, 2018 at 5:49 AM, Jonas notifications@github.com wrote:
Maybe it is something wrong with my installation but I have re-installed it with dependencies but I am also getting an error when I am using the -no_dup_removal option whether I start with a fastq-file or a raw bam-file. Any idea how I can get it to run with the -no_dup_removal option activated.
My full command was:
atac.bds -species hg19 -enable_idr -auto_detect_adapter -no_dup_removal -rm_chr_from_tag mito -multimapping 4 -nth 8 -se -fastq1 file1.fastq.gz -fastq2 file2.fastq.gz -fastq3 file3.fastq.gz
`shuf: '/mnt/data/common/jonas/atacseq/NALM6_ATACENCODE pipeline/out/align/rep2/NALM6_index_3_trimmed.filt.tn5.tagAlign.gz': end of file Task failed: Program & line : '/mnt/data/bioinfo_toolsand refs/bioinfo_tools/kundajelab_ENCODE_atac_dnase_pipeline/ atac_dnase_pipelines/modules/postalign_bed.bds', line 152 Task Name : 'spr rep2' Task ID : 'atac.bds.20180109_114003_949_parallel45/task.postalign bed.spr_rep2.line_152.id_21' Task PID : '32873' Task hint : 'if [[ -f $(which conda) && $(conda env list | grep bds_atac | wc -l) != "0" ]]; then source activate bds_atac; sleep 5; fi; export PATH=/mnt/data/bio' Task resources : 'cpus: -1 mem: -1.0 B wall-timeout: 8640000' State : 'ERROR' Dependency state : 'ERROR' Retries available : '1' Input files : '[/mnt/data/common/jonas/atacseq/NALM6_ATACENCODE pipeline/out/align/rep2/NALM6_index_3_trimmed.filt.tn5.tagAlign.gz]' Output files : '[/mnt/data/common/jonas/atacseq/NALM6_ATACENCODE pipeline/out/align/pseudo_reps/rep2/pr1/NALM6_index3 trimmed.filt.tn5.pr1.tagAlign.gz, /mnt/data/common/jonas/ atacseq/NALM6_ATAC_ENCODEpipeline/out/align/pseudo reps/rep2/pr2/NALM6_index_3_trimmed.filt.tn5.pr2.tagAlign.gz]' Script file : '/mnt/data/common/jonas/atacseq/NALM6_ATACENCODE pipeline/atac.bds.20180109_114003_949_parallel_45/task. postalign_bed.spr_rep2.line_152.id_21.sh' Exit status : '1' StdErr (10 lines) : shuf: '/mnt/data/common/jonas/atacseq/NALM6_ATACENCODE pipeline/out/align/rep2/NALM6_index_3_trimmed.filt.tn5.tagAlign.gz': end of file
Fatal error: /mnt/data/bioinfo_tools_and_refs/bioinfotools/kundajelab ENCODE_atac_dnase_pipeline/atac_dnase_pipelines/atac.bds, line 662, pos
- Task/s failed`
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kundajelab/atac_dnase_pipelines/issues/89#issuecomment-356288809, or mute the thread https://github.com/notifications/unsubscribe-auth/AIOd_Hkcs6wOK56qURkDcI0PD_OWltSyks5tI26BgaJpZM4RUby9 .
Thanks, now I have done that both for one command that fails due to the -no_dup_removal and one successful run when it is not there. Clear is that files are missing in the first place but I am not sure why:
FAILED DUE TO -no_dup_removal:
bds /mnt/data/bioinfo_tools_and_refs/bioinfo_tools/kundajelab_ENCODE_atac_dnase_pipeline/atac_dnase_pipelines/atac.bds -species hg19 -no_xcor -enable_idr -auto_detect_adapter -out_dir out_NALM6_no_dup_rem_se -no_dup_removal -rm_chr_from_tag mito -multimapping 4 -nth 8 -se -fastq1 NALM6_index_1.fastq.gz -fastq2 NALM6_index_3.fastq.gz -fastq3 NALM6_index_5.fastq.gz
ls -l $OUT_DIR/align/rep*
/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se//align/rep1: total 20G -rw-rw-r-- 1 jonas sigvardsson 12G Jan 10 21:55 NALM6_index_1.trim.bam -rw-rw-r-- 1 jonas sigvardsson 2.4M Jan 10 22:12 NALM6_index_1.trim.bam.bai -rw-rw-r-- 1 jonas sigvardsson 4.7G Jan 10 15:01 NALM6_index_1.trim.fastq.gz -rw-rw-r-- 1 jonas sigvardsson 2.7G Jan 11 01:08 NALM6_index_1.trim.filt.bam -rw-rw-r-- 1 jonas sigvardsson 6.2M Jan 11 01:08 NALM6_index_1.trim.filt.bam.bai -rw-rw-r-- 1 jonas sigvardsson 161M Jan 11 01:16 NALM6_index_1.trim.filt.tagAlign.gz -rw-rw-r-- 1 jonas sigvardsson 161M Jan 11 01:18 NALM6_index_1.trim.filt.tn5.tagAlign.gz
/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se//align/rep2: total 23G -rw-rw-r-- 1 jonas sigvardsson 16G Jan 10 22:37 NALM6_index_3.trim.bam -rw-rw-r-- 1 jonas sigvardsson 2.6M Jan 10 22:44 NALM6_index_3.trim.bam.bai -rw-rw-r-- 1 jonas sigvardsson 6.7G Jan 10 15:19 NALM6_index_3.trim.fastq.gz
/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se//align/rep3: total 24G -rw-rw-r-- 1 jonas sigvardsson 15G Jan 10 21:59 NALM6_index_5.trim.bam -rw-rw-r-- 1 jonas sigvardsson 2.5M Jan 10 22:18 NALM6_index_5.trim.bam.bai -rw-rw-r-- 1 jonas sigvardsson 5.9G Jan 10 15:13 NALM6_index_5.trim.fastq.gz -rw-rw-r-- 1 jonas sigvardsson 3.2G Jan 11 01:06 NALM6_index_5.trim.filt.bam -rw-rw-r-- 1 jonas sigvardsson 6.2M Jan 11 01:06 NALM6_index_5.trim.filt.bam.bai -rw-rw-r-- 1 jonas sigvardsson 177M Jan 11 01:15 NALM6_index_5.trim.filt.tagAlign.gz -rw-rw-r-- 1 jonas sigvardsson 178M Jan 11 01:17 NALM6_index_5.trim.filt.tn5.tagAlign.gz
$ ls -l $OUT_DIR/qc/rep*
/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se//qc/rep1: total 16K -rw-rw-r-- 1 jonas sigvardsson 407 Jan 10 14:17 NALM6_index_1.adapter.txt -rw-rw-r-- 1 jonas sigvardsson 224 Jan 10 20:46 NALM6_index_1.trim.align.log -rw-rw-r-- 1 jonas sigvardsson 392 Jan 10 22:23 NALM6_index_1.trim.flagstat.qc -rw-rw-r-- 1 jonas sigvardsson 3 Jan 10 15:01 NALM6_index_1.trim.read_length.txt
/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se//qc/rep2: total 16K -rw-rw-r-- 1 jonas sigvardsson 407 Jan 10 14:18 NALM6_index_3.adapter.txt -rw-rw-r-- 1 jonas sigvardsson 225 Jan 10 20:59 NALM6_index_3.trim.align.log -rw-rw-r-- 1 jonas sigvardsson 392 Jan 10 22:59 NALM6_index_3.trim.flagstat.qc -rw-rw-r-- 1 jonas sigvardsson 3 Jan 10 15:20 NALM6_index_3.trim.read_length.txt
/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se//qc/rep3: total 16K -rw-rw-r-- 1 jonas sigvardsson 407 Jan 10 14:18 NALM6_index_5.adapter.txt -rw-rw-r-- 1 jonas sigvardsson 225 Jan 10 20:35 NALM6_index_5.trim.align.log -rw-rw-r-- 1 jonas sigvardsson 392 Jan 10 22:32 NALM6_index_5.trim.flagstat.qc -rw-rw-r-- 1 jonas sigvardsson 3 Jan 10 15:13 NALM6_index_5.trim.read_length.txt
SUCCESSFUL COMMAND:
bds /mnt/data/bioinfo_tools_and_refs/bioinfo_tools/kundajelab_ENCODE_atac_dnase_pipeline/atac_dnase_pipelines/atac.bds -species hg19 -no_xcor -enable_idr -auto_detect_adapter -out_dir out_NALM6_standard_se -rm_chr_from_tag mito -multimapping 4 -nth 8 -se -fastq1 NALM6_index_1.fastq.gz -fastq2 NALM6_index_3.fastq.gz -fastq3 NALM6_index_5.fastq.gz
ls -l $OUT_DIR/align/rep*
/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_standard_se/align/rep1:
total 18G
-rw-rw-r-- 1 jonas sigvardsson 12G Jan 10 22:02 NALM6_index_1.trim.bam
-rw-rw-r-- 1 jonas sigvardsson 2.4M Jan 10 22:17 NALM6_index_1.trim.bam.bai
-rw-rw-r-- 1 jonas sigvardsson 4.7G Jan 10 15:01 NALM6_index_1.trim.fastq.gz
-rw-rw-r-- 1 jonas sigvardsson 1022M Jan 11 01:38 NALM6_index_1.trim.nodup.bam
-rw-rw-r-- 1 jonas sigvardsson 6.2M Jan 11 01:38 NALM6_index_1.trim.nodup.bam.bai
-rw-rw-r-- 1 jonas sigvardsson 114M Jan 11 01:47 NALM6_index_1.trim.nodup.tagAlign.gz
-rw-rw-r-- 1 jonas sigvardsson 114M Jan 11 01:48 NALM6_index_1.trim.nodup.tn5.tagAlign.gz
/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_standard_se/align/rep2: total 24G -rw-rw-r-- 1 jonas sigvardsson 16G Jan 10 22:33 NALM6_index_3.trim.bam -rw-rw-r-- 1 jonas sigvardsson 2.6M Jan 10 22:40 NALM6_index_3.trim.bam.bai -rw-rw-r-- 1 jonas sigvardsson 6.7G Jan 10 15:20 NALM6_index_3.trim.fastq.gz -rw-rw-r-- 1 jonas sigvardsson 997M Jan 11 10:45 NALM6_index_3.trim.nodup.bam -rw-rw-r-- 1 jonas sigvardsson 6.2M Jan 11 10:45 NALM6_index_3.trim.nodup.bam.bai -rw-rw-r-- 1 jonas sigvardsson 113M Jan 11 10:56 NALM6_index_3.trim.nodup.tagAlign.gz -rw-rw-r-- 1 jonas sigvardsson 112M Jan 11 10:57 NALM6_index_3.trim.nodup.tn5.tagAlign.gz
/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_standard_se/align/rep3: total 22G -rw-rw-r-- 1 jonas sigvardsson 15G Jan 10 21:54 NALM6_index_5.trim.bam -rw-rw-r-- 1 jonas sigvardsson 2.5M Jan 10 22:17 NALM6_index_5.trim.bam.bai -rw-rw-r-- 1 jonas sigvardsson 5.9G Jan 10 15:14 NALM6_index_5.trim.fastq.gz -rw-rw-r-- 1 jonas sigvardsson 1.1G Jan 11 01:38 NALM6_index_5.trim.nodup.bam -rw-rw-r-- 1 jonas sigvardsson 6.2M Jan 11 01:38 NALM6_index_5.trim.nodup.bam.bai -rw-rw-r-- 1 jonas sigvardsson 121M Jan 11 01:49 NALM6_index_5.trim.nodup.tagAlign.gz -rw-rw-r-- 1 jonas sigvardsson 121M Jan 11 01:50 NALM6_index_5.trim.nodup.tn5.tagAlign.gz
ls -l $OUT_DIR/qc/rep*
/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_standard_se/qc/rep1:
total 47M
-rw-rw-r-- 1 jonas sigvardsson 407 Jan 10 14:19 NALM6_index_1.adapter.txt
-rw-rw-r-- 1 jonas sigvardsson 224 Jan 10 20:51 NALM6_index_1.trim.align.log
-rw-rw-r-- 1 jonas sigvardsson 1.4K Jan 11 01:32 NALM6_index_1.trim.dup.qc
-rw-rw-r-- 1 jonas sigvardsson 2.3M Jan 11 15:52 NALM6_index_1.trim.dupmark.ataqc.bam.bai
-rw-rw-r-- 1 jonas sigvardsson 392 Jan 10 22:28 NALM6_index_1.trim.flagstat.qc
-rw-rw-r-- 1 jonas sigvardsson 383 Jan 11 01:38 NALM6_index_1.trim.nodup.flagstat.qc
-rw-rw-r-- 1 jonas sigvardsson 62 Jan 11 01:44 NALM6_index_1.trim.nodup.pbc.qc
-rw-rw-r-- 1 jonas sigvardsson 766 Jan 11 12:28 NALM6_index_1.trim.picardcomplexity.qc
-rw-rw-r-- 1 jonas sigvardsson 462K Jan 11 14:33 NALM6_index_1.trim.preseq.dat
-rw-rw-r-- 1 jonas sigvardsson 47K Jan 11 14:33 NALM6_index_1.trim.preseq.log
-rw-rw-r-- 1 jonas sigvardsson 3 Jan 10 15:01 NALM6_index_1.trim.read_length.txt
-rw-rw-r-- 1 jonas sigvardsson 44M Jan 11 16:10 NALM6_index_1.trim.signal
-rw-rw-r-- 1 jonas sigvardsson 4.9K Jan 11 12:16 NALM6_index_1.trim_gc.txt
-rw-rw-r-- 1 jonas sigvardsson 8.8K Jan 11 12:16 NALM6_index_1.trim_gcPlot.pdf
-rw-rw-r-- 1 jonas sigvardsson 1.2K Jan 11 12:16 NALM6_index_1.trim_gcSummary.txt
-rw-rw-r-- 1 jonas sigvardsson 228K Jan 11 16:07 NALM6_index_1.trim_large_tss-enrich.png
-rw-rw-r-- 1 jonas sigvardsson 658K Jan 11 16:10 NALM6_index_1.trim_qc.html
-rw-rw-r-- 1 jonas sigvardsson 1.8K Jan 11 16:10 NALM6_index_1.trim_qc.txt
-rw-rw-r-- 1 jonas sigvardsson 43K Jan 11 16:06 NALM6_index_1.trim_tss-enrich.png
/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_standard_se/qc/rep2: total 48M -rw-rw-r-- 1 jonas sigvardsson 407 Jan 10 14:19 NALM6_index_3.adapter.txt -rw-rw-r-- 1 jonas sigvardsson 225 Jan 10 20:56 NALM6_index_3.trim.align.log -rw-rw-r-- 1 jonas sigvardsson 1.4K Jan 11 10:39 NALM6_index_3.trim.dup.qc -rw-rw-r-- 1 jonas sigvardsson 2.4M Jan 11 17:17 NALM6_index_3.trim.dupmark.ataqc.bam.bai -rw-rw-r-- 1 jonas sigvardsson 392 Jan 10 22:56 NALM6_index_3.trim.flagstat.qc -rw-rw-r-- 1 jonas sigvardsson 383 Jan 11 10:45 NALM6_index_3.trim.nodup.flagstat.qc -rw-rw-r-- 1 jonas sigvardsson 61 Jan 11 10:54 NALM6_index_3.trim.nodup.pbc.qc -rw-rw-r-- 1 jonas sigvardsson 766 Jan 11 12:32 NALM6_index_3.trim.picardcomplexity.qc -rw-rw-r-- 1 jonas sigvardsson 459K Jan 11 15:28 NALM6_index_3.trim.preseq.dat -rw-rw-r-- 1 jonas sigvardsson 62K Jan 11 15:28 NALM6_index_3.trim.preseq.log -rw-rw-r-- 1 jonas sigvardsson 3 Jan 10 15:21 NALM6_index_3.trim.read_length.txt -rw-rw-r-- 1 jonas sigvardsson 44M Jan 11 17:39 NALM6_index_3.trim.signal -rw-rw-r-- 1 jonas sigvardsson 4.9K Jan 11 12:16 NALM6_index_3.trim_gc.txt -rw-rw-r-- 1 jonas sigvardsson 8.8K Jan 11 12:16 NALM6_index_3.trim_gcPlot.pdf -rw-rw-r-- 1 jonas sigvardsson 1.2K Jan 11 12:16 NALM6_index_3.trim_gcSummary.txt -rw-rw-r-- 1 jonas sigvardsson 225K Jan 11 17:36 NALM6_index_3.trim_large_tss-enrich.png -rw-rw-r-- 1 jonas sigvardsson 647K Jan 11 17:40 NALM6_index_3.trim_qc.html -rw-rw-r-- 1 jonas sigvardsson 1.8K Jan 11 17:40 NALM6_index_3.trim_qc.txt -rw-rw-r-- 1 jonas sigvardsson 43K Jan 11 17:36 NALM6_index_3.trim_tss-enrich.png
/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_standard_se/qc/rep3: total 48M -rw-rw-r-- 1 jonas sigvardsson 407 Jan 10 14:19 NALM6_index_5.adapter.txt -rw-rw-r-- 1 jonas sigvardsson 225 Jan 10 20:30 NALM6_index_5.trim.align.log -rw-rw-r-- 1 jonas sigvardsson 1.4K Jan 11 01:32 NALM6_index_5.trim.dup.qc -rw-rw-r-- 1 jonas sigvardsson 2.4M Jan 11 16:44 NALM6_index_5.trim.dupmark.ataqc.bam.bai -rw-rw-r-- 1 jonas sigvardsson 392 Jan 10 22:31 NALM6_index_5.trim.flagstat.qc -rw-rw-r-- 1 jonas sigvardsson 383 Jan 11 01:39 NALM6_index_5.trim.nodup.flagstat.qc -rw-rw-r-- 1 jonas sigvardsson 62 Jan 11 01:46 NALM6_index_5.trim.nodup.pbc.qc -rw-rw-r-- 1 jonas sigvardsson 766 Jan 11 12:31 NALM6_index_5.trim.picardcomplexity.qc -rw-rw-r-- 1 jonas sigvardsson 460K Jan 11 15:06 NALM6_index_5.trim.preseq.dat -rw-rw-r-- 1 jonas sigvardsson 57K Jan 11 15:06 NALM6_index_5.trim.preseq.log -rw-rw-r-- 1 jonas sigvardsson 3 Jan 10 15:14 NALM6_index_5.trim.read_length.txt -rw-rw-r-- 1 jonas sigvardsson 44M Jan 11 17:04 NALM6_index_5.trim.signal -rw-rw-r-- 1 jonas sigvardsson 4.9K Jan 11 12:16 NALM6_index_5.trim_gc.txt -rw-rw-r-- 1 jonas sigvardsson 8.8K Jan 11 12:16 NALM6_index_5.trim_gcPlot.pdf -rw-rw-r-- 1 jonas sigvardsson 1.2K Jan 11 12:16 NALM6_index_5.trim_gcSummary.txt -rw-rw-r-- 1 jonas sigvardsson 227K Jan 11 17:01 NALM6_index_5.trim_large_tss-enrich.png -rw-rw-r-- 1 jonas sigvardsson 652K Jan 11 17:05 NALM6_index_5.trim_qc.html -rw-rw-r-- 1 jonas sigvardsson 1.8K Jan 11 17:05 NALM6_index_5.trim_qc.txt -rw-rw-r-- 1 jonas sigvardsson 43K Jan 11 17:01 NALM6_index_5.trim_tss-enrich.png
I wanted to look at NALM6_index_3_trimmed.filt.tn5.tagAlign.gz' on your directory '/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_ pipeline/out/align/rep2/
.
But that file doesn't seem to exist? Did you list files on a different sample?
Please give me full debugging information about one specific sample.
Jin
On Thu, Jan 11, 2018 at 11:36 PM, Jonas notifications@github.com wrote:
Thanks, now I have done that both for one command that fails due to the -no_dup_removal and one successful run when it is not there. Clear is that files are missing in the first place but I am not sure why:
FAILED DUE TO -no_dup_removal:
bds /mnt/data/bioinfo_tools_and_refs/bioinfotools/kundajelab ENCODE_atac_dnase_pipeline/atac_dnase_pipelines/atac.bds -species hg19 -no_xcor -enable_idr -auto_detect_adapter -out_dir out_NALM6_no_dup_rem_se -no_dup_removal -rm_chr_from_tag mito -multimapping 4 -nth 8 -se -fastq1 NALM6_index_1.fastq.gz -fastq2 NALM6_index_3.fastq.gz -fastq3 NALM6_index_5.fastq.gz
ls -l $OUT_DIR/align/rep*
/mnt/data/common/jonas/atacseq/NALM6_ATACENCODE pipeline/out_NALM6_no_dup_rem_se//align/rep1: total 20G -rw-rw-r-- 1 jonas sigvardsson 12G Jan 10 21:55 NALM6_index_1.trim.bam -rw-rw-r-- 1 jonas sigvardsson 2.4M Jan 10 22:12 NALM6_index_1.trim.bam.bai -rw-rw-r-- 1 jonas sigvardsson 4.7G Jan 10 15:01 NALM6_index_1.trim.fastq.gz -rw-rw-r-- 1 jonas sigvardsson 2.7G Jan 11 01:08 NALM6_index_1.trim.filt.bam -rw-rw-r-- 1 jonas sigvardsson 6.2M Jan 11 01:08 NALM6_index_1.trim.filt.bam.bai -rw-rw-r-- 1 jonas sigvardsson 161M Jan 11 01:16 NALM6_index_1.trim.filt. tagAlign.gz -rw-rw-r-- 1 jonas sigvardsson 161M Jan 11 01:18 NALM6_index_1.trim.filt.tn5.tagAlign.gz
/mnt/data/common/jonas/atacseq/NALM6_ATACENCODE pipeline/out_NALM6_no_dup_rem_se//align/rep2: total 23G -rw-rw-r-- 1 jonas sigvardsson 16G Jan 10 22:37 NALM6_index_3.trim.bam -rw-rw-r-- 1 jonas sigvardsson 2.6M Jan 10 22:44 NALM6_index_3.trim.bam.bai -rw-rw-r-- 1 jonas sigvardsson 6.7G Jan 10 15:19 NALM6_index_3.trim.fastq.gz
/mnt/data/common/jonas/atacseq/NALM6_ATACENCODE pipeline/out_NALM6_no_dup_rem_se//align/rep3: total 24G -rw-rw-r-- 1 jonas sigvardsson 15G Jan 10 21:59 NALM6_index_5.trim.bam -rw-rw-r-- 1 jonas sigvardsson 2.5M Jan 10 22:18 NALM6_index_5.trim.bam.bai -rw-rw-r-- 1 jonas sigvardsson 5.9G Jan 10 15:13 NALM6_index_5.trim.fastq.gz -rw-rw-r-- 1 jonas sigvardsson 3.2G Jan 11 01:06 NALM6_index_5.trim.filt.bam -rw-rw-r-- 1 jonas sigvardsson 6.2M Jan 11 01:06 NALM6_index_5.trim.filt.bam.bai -rw-rw-r-- 1 jonas sigvardsson 177M Jan 11 01:15 NALM6_index_5.trim.filt. tagAlign.gz -rw-rw-r-- 1 jonas sigvardsson 178M Jan 11 01:17 NALM6_index_5.trim.filt.tn5.tagAlign.gz
$ ls -l $OUT_DIR/qc/rep*
/mnt/data/common/jonas/atacseq/NALM6_ATACENCODE pipeline/out_NALM6_no_dup_rem_se//qc/rep1: total 16K -rw-rw-r-- 1 jonas sigvardsson 407 Jan 10 14:17 NALM6_index_1.adapter.txt -rw-rw-r-- 1 jonas sigvardsson 224 Jan 10 20:46 NALM6_index_1.trim.align.log -rw-rw-r-- 1 jonas sigvardsson 392 Jan 10 22:23 NALM6_index_1.trim.flagstat.qc -rw-rw-r-- 1 jonas sigvardsson 3 Jan 10 15:01 NALM6_index1.trim.read length.txt
/mnt/data/common/jonas/atacseq/NALM6_ATACENCODE pipeline/out_NALM6_no_dup_rem_se//qc/rep2: total 16K -rw-rw-r-- 1 jonas sigvardsson 407 Jan 10 14:18 NALM6_index_3.adapter.txt -rw-rw-r-- 1 jonas sigvardsson 225 Jan 10 20:59 NALM6_index_3.trim.align.log -rw-rw-r-- 1 jonas sigvardsson 392 Jan 10 22:59 NALM6_index_3.trim.flagstat.qc -rw-rw-r-- 1 jonas sigvardsson 3 Jan 10 15:20 NALM6_index3.trim.read length.txt
/mnt/data/common/jonas/atacseq/NALM6_ATACENCODE pipeline/out_NALM6_no_dup_rem_se//qc/rep3: total 16K -rw-rw-r-- 1 jonas sigvardsson 407 Jan 10 14:18 NALM6_index_5.adapter.txt -rw-rw-r-- 1 jonas sigvardsson 225 Jan 10 20:35 NALM6_index_5.trim.align.log -rw-rw-r-- 1 jonas sigvardsson 392 Jan 10 22:32 NALM6_index_5.trim.flagstat.qc -rw-rw-r-- 1 jonas sigvardsson 3 Jan 10 15:13 NALM6_index5.trim.read length.txt
SUCCESSFUL COMMAND:
bds /mnt/data/bioinfo_tools_and_refs/bioinfotools/kundajelab ENCODE_atac_dnase_pipeline/atac_dnase_pipelines/atac.bds -species hg19 -no_xcor -enable_idr -auto_detect_adapter -out_dir out_NALM6_standard_se -rm_chr_from_tag mito -multimapping 4 -nth 8 -se -fastq1 NALM6_index_1.fastq.gz -fastq2 NALM6_index_3.fastq.gz -fastq3 NALM6_index_5.fastq.gz
ls -l $OUT_DIR/align/rep* /mnt/data/common/jonas/atacseq/NALM6_ATACENCODE pipeline/out_NALM6_standard_se/align/rep1: total 18G -rw-rw-r-- 1 jonas sigvardsson 12G Jan 10 22:02 NALM6_index_1.trim.bam -rw-rw-r-- 1 jonas sigvardsson 2.4M Jan 10 22:17 NALM6_index_1.trim.bam.bai -rw-rw-r-- 1 jonas sigvardsson 4.7G Jan 10 15:01 NALM6_index_1.trim.fastq.gz -rw-rw-r-- 1 jonas sigvardsson 1022M Jan 11 01:38 NALM6_index_1.trim.nodup.bam -rw-rw-r-- 1 jonas sigvardsson 6.2M Jan 11 01:38 NALM6_index_1.trim.nodup.bam.bai -rw-rw-r-- 1 jonas sigvardsson 114M Jan 11 01:47 NALM6_index_1.trim.nodup. tagAlign.gz -rw-rw-r-- 1 jonas sigvardsson 114M Jan 11 01:48 NALM6_index_1.trim.nodup.tn5.tagAlign.gz
/mnt/data/common/jonas/atacseq/NALM6_ATACENCODE pipeline/out_NALM6_standard_se/align/rep2: total 24G -rw-rw-r-- 1 jonas sigvardsson 16G Jan 10 22:33 NALM6_index_3.trim.bam -rw-rw-r-- 1 jonas sigvardsson 2.6M Jan 10 22:40 NALM6_index_3.trim.bam.bai -rw-rw-r-- 1 jonas sigvardsson 6.7G Jan 10 15:20 NALM6_index_3.trim.fastq.gz -rw-rw-r-- 1 jonas sigvardsson 997M Jan 11 10:45 NALM6_index_3.trim.nodup.bam -rw-rw-r-- 1 jonas sigvardsson 6.2M Jan 11 10:45 NALM6_index_3.trim.nodup.bam.bai -rw-rw-r-- 1 jonas sigvardsson 113M Jan 11 10:56 NALM6_index_3.trim.nodup. tagAlign.gz -rw-rw-r-- 1 jonas sigvardsson 112M Jan 11 10:57 NALM6_index_3.trim.nodup.tn5.tagAlign.gz
/mnt/data/common/jonas/atacseq/NALM6_ATACENCODE pipeline/out_NALM6_standard_se/align/rep3: total 22G -rw-rw-r-- 1 jonas sigvardsson 15G Jan 10 21:54 NALM6_index_5.trim.bam -rw-rw-r-- 1 jonas sigvardsson 2.5M Jan 10 22:17 NALM6_index_5.trim.bam.bai -rw-rw-r-- 1 jonas sigvardsson 5.9G Jan 10 15:14 NALM6_index_5.trim.fastq.gz -rw-rw-r-- 1 jonas sigvardsson 1.1G Jan 11 01:38 NALM6_index_5.trim.nodup.bam -rw-rw-r-- 1 jonas sigvardsson 6.2M Jan 11 01:38 NALM6_index_5.trim.nodup.bam.bai -rw-rw-r-- 1 jonas sigvardsson 121M Jan 11 01:49 NALM6_index_5.trim.nodup. tagAlign.gz -rw-rw-r-- 1 jonas sigvardsson 121M Jan 11 01:50 NALM6_index_5.trim.nodup.tn5.tagAlign.gz
ls -l $OUT_DIR/qc/rep* /mnt/data/common/jonas/atacseq/NALM6_ATACENCODE pipeline/out_NALM6_standard_se/qc/rep1: total 47M -rw-rw-r-- 1 jonas sigvardsson 407 Jan 10 14:19 NALM6_index_1.adapter.txt -rw-rw-r-- 1 jonas sigvardsson 224 Jan 10 20:51 NALM6_index_1.trim.align.log -rw-rw-r-- 1 jonas sigvardsson 1.4K Jan 11 01:32 NALM6_index_1.trim.dup.qc -rw-rw-r-- 1 jonas sigvardsson 2.3M Jan 11 15:52 NALM6_index_1.trim.dupmark.ataqc.bam.bai -rw-rw-r-- 1 jonas sigvardsson 392 Jan 10 22:28 NALM6_index_1.trim.flagstat.qc -rw-rw-r-- 1 jonas sigvardsson 383 Jan 11 01:38 NALM6_index_1.trim.nodup. flagstat.qc -rw-rw-r-- 1 jonas sigvardsson 62 Jan 11 01:44 NALM6_index_1.trim.nodup.pbc.qc -rw-rw-r-- 1 jonas sigvardsson 766 Jan 11 12:28 NALM6_index_1.trim. picardcomplexity.qc -rw-rw-r-- 1 jonas sigvardsson 462K Jan 11 14:33 NALM6_index_1.trim.preseq.dat -rw-rw-r-- 1 jonas sigvardsson 47K Jan 11 14:33 NALM6_index_1.trim.preseq.log -rw-rw-r-- 1 jonas sigvardsson 3 Jan 10 15:01 NALM6_index1.trim.read length.txt -rw-rw-r-- 1 jonas sigvardsson 44M Jan 11 16:10 NALM6_index_1.trim.signal -rw-rw-r-- 1 jonas sigvardsson 4.9K Jan 11 12:16 NALM6_index_1.trim_gc.txt -rw-rw-r-- 1 jonas sigvardsson 8.8K Jan 11 12:16 NALM6_index_1.trim_gcPlot.pdf -rw-rw-r-- 1 jonas sigvardsson 1.2K Jan 11 12:16 NALM6_index_1.trim_gcSummary.txt -rw-rw-r-- 1 jonas sigvardsson 228K Jan 11 16:07 NALM6_index_1.trim_large_tss-enrich.png -rw-rw-r-- 1 jonas sigvardsson 658K Jan 11 16:10 NALM6_index_1.trim_qc.html -rw-rw-r-- 1 jonas sigvardsson 1.8K Jan 11 16:10 NALM6_index_1.trim_qc.txt -rw-rw-r-- 1 jonas sigvardsson 43K Jan 11 16:06 NALM6_index_1.trim_tss-enrich.png
/mnt/data/common/jonas/atacseq/NALM6_ATACENCODE pipeline/out_NALM6_standard_se/qc/rep2: total 48M -rw-rw-r-- 1 jonas sigvardsson 407 Jan 10 14:19 NALM6_index_3.adapter.txt -rw-rw-r-- 1 jonas sigvardsson 225 Jan 10 20:56 NALM6_index_3.trim.align.log -rw-rw-r-- 1 jonas sigvardsson 1.4K Jan 11 10:39 NALM6_index_3.trim.dup.qc -rw-rw-r-- 1 jonas sigvardsson 2.4M Jan 11 17:17 NALM6_index_3.trim.dupmark.ataqc.bam.bai -rw-rw-r-- 1 jonas sigvardsson 392 Jan 10 22:56 NALM6_index_3.trim.flagstat.qc -rw-rw-r-- 1 jonas sigvardsson 383 Jan 11 10:45 NALM6_index_3.trim.nodup. flagstat.qc -rw-rw-r-- 1 jonas sigvardsson 61 Jan 11 10:54 NALM6_index_3.trim.nodup.pbc.qc -rw-rw-r-- 1 jonas sigvardsson 766 Jan 11 12:32 NALM6_index_3.trim. picardcomplexity.qc -rw-rw-r-- 1 jonas sigvardsson 459K Jan 11 15:28 NALM6_index_3.trim.preseq.dat -rw-rw-r-- 1 jonas sigvardsson 62K Jan 11 15:28 NALM6_index_3.trim.preseq.log -rw-rw-r-- 1 jonas sigvardsson 3 Jan 10 15:21 NALM6_index3.trim.read length.txt -rw-rw-r-- 1 jonas sigvardsson 44M Jan 11 17:39 NALM6_index_3.trim.signal -rw-rw-r-- 1 jonas sigvardsson 4.9K Jan 11 12:16 NALM6_index_3.trim_gc.txt -rw-rw-r-- 1 jonas sigvardsson 8.8K Jan 11 12:16 NALM6_index_3.trim_gcPlot.pdf -rw-rw-r-- 1 jonas sigvardsson 1.2K Jan 11 12:16 NALM6_index_3.trim_gcSummary.txt -rw-rw-r-- 1 jonas sigvardsson 225K Jan 11 17:36 NALM6_index_3.trim_large_tss-enrich.png -rw-rw-r-- 1 jonas sigvardsson 647K Jan 11 17:40 NALM6_index_3.trim_qc.html -rw-rw-r-- 1 jonas sigvardsson 1.8K Jan 11 17:40 NALM6_index_3.trim_qc.txt -rw-rw-r-- 1 jonas sigvardsson 43K Jan 11 17:36 NALM6_index_3.trim_tss-enrich.png
/mnt/data/common/jonas/atacseq/NALM6_ATACENCODE pipeline/out_NALM6_standard_se/qc/rep3: total 48M -rw-rw-r-- 1 jonas sigvardsson 407 Jan 10 14:19 NALM6_index_5.adapter.txt -rw-rw-r-- 1 jonas sigvardsson 225 Jan 10 20:30 NALM6_index_5.trim.align.log -rw-rw-r-- 1 jonas sigvardsson 1.4K Jan 11 01:32 NALM6_index_5.trim.dup.qc -rw-rw-r-- 1 jonas sigvardsson 2.4M Jan 11 16:44 NALM6_index_5.trim.dupmark.ataqc.bam.bai -rw-rw-r-- 1 jonas sigvardsson 392 Jan 10 22:31 NALM6_index_5.trim.flagstat.qc -rw-rw-r-- 1 jonas sigvardsson 383 Jan 11 01:39 NALM6_index_5.trim.nodup. flagstat.qc -rw-rw-r-- 1 jonas sigvardsson 62 Jan 11 01:46 NALM6_index_5.trim.nodup.pbc.qc -rw-rw-r-- 1 jonas sigvardsson 766 Jan 11 12:31 NALM6_index_5.trim. picardcomplexity.qc -rw-rw-r-- 1 jonas sigvardsson 460K Jan 11 15:06 NALM6_index_5.trim.preseq.dat -rw-rw-r-- 1 jonas sigvardsson 57K Jan 11 15:06 NALM6_index_5.trim.preseq.log -rw-rw-r-- 1 jonas sigvardsson 3 Jan 10 15:14 NALM6_index5.trim.read length.txt -rw-rw-r-- 1 jonas sigvardsson 44M Jan 11 17:04 NALM6_index_5.trim.signal -rw-rw-r-- 1 jonas sigvardsson 4.9K Jan 11 12:16 NALM6_index_5.trim_gc.txt -rw-rw-r-- 1 jonas sigvardsson 8.8K Jan 11 12:16 NALM6_index_5.trim_gcPlot.pdf -rw-rw-r-- 1 jonas sigvardsson 1.2K Jan 11 12:16 NALM6_index_5.trim_gcSummary.txt -rw-rw-r-- 1 jonas sigvardsson 227K Jan 11 17:01 NALM6_index_5.trim_large_tss-enrich.png -rw-rw-r-- 1 jonas sigvardsson 652K Jan 11 17:05 NALM6_index_5.trim_qc.html -rw-rw-r-- 1 jonas sigvardsson 1.8K Jan 11 17:05 NALM6_index_5.trim_qc.txt -rw-rw-r-- 1 jonas sigvardsson 43K Jan 11 17:01 NALM6_index_5.trim_tss-enrich.png
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kundajelab/atac_dnase_pipelines/issues/89#issuecomment-357164476, or mute the thread https://github.com/notifications/unsubscribe-auth/AIOd_Ke94XJl3tCgXpDwE4pT_dlhumbUks5tJwt3gaJpZM4RUby9 .
Thanks, I attach it here. Seems like deduplication fails even though I specifically gave the option not to deduplicate but there may be more going on that I do not get.
1) error in deduplication:
sambamba-sort: Unable to write to stream
This means that you don't have not enough memory or space in $TMP $TMP /tmp.
2) error in spr: again out of memory or disk space? please run the following for more debugging info.
$ ls -l /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/rep1
$ free -h
$ df -h
$ df -h $TMP
$ df -h $TMPDIR
$ df -h /tmp
Pipeline needs at least >20G of memory. If you are working on a laptop/desktop, you will need to add parameter with parameter -nth 1
to disable parallelization and multi-threading.
Yes I have understood that from previous threads and I ran into that problem myself but then Ubuntu issued a warning. This was however before I changed the tmpdir and the problem was discovered first after. I have however not set a tmp variable.
free -h total used free shared buff/cache available Mem: 251G 2.7G 41G 110M 207G 247G Swap: 10G 0B 10G
df -h Filesystem Size Used Avail Use% Mounted on udev 126G 0 126G 0% /dev tmpfs 26G 18M 26G 1% /run /dev/sda2 213G 39G 174G 19% / tmpfs 126G 46M 126G 1% /dev/shm tmpfs 5.0M 12K 5.0M 1% /run/lock tmpfs 126G 0 126G 0% /sys/fs/cgroup /dev/sda1 922M 487M 372M 57% /boot /dev/sdb2 7.3T 143G 7.2T 2% /home /dev/sdb1 22T 2.3T 20T 11% /mnt/data tmpfs 26G 40K 26G 1% /run/user/1002
df -h $TMP Filesystem Size Used Avail Use% Mounted on udev 126G 0 126G 0% /dev tmpfs 26G 18M 26G 1% /run /dev/sda2 213G 39G 174G 19% / tmpfs 126G 49M 126G 1% /dev/shm tmpfs 5.0M 12K 5.0M 1% /run/lock tmpfs 126G 0 126G 0% /sys/fs/cgroup /dev/sda1 922M 487M 372M 57% /boot /dev/sdb2 7.3T 143G 7.2T 2% /home /dev/sdb1 22T 2.3T 20T 11% /mnt/data tmpfs 26G 40K 26G 1% /run/user/1002
df -h $TMPDIR Filesystem Size Used Avail Use% Mounted on /dev/sdb1 22T 2.3T 20T 11% /mnt/data
df -h /tmp Filesystem Size Used Avail Use% Mounted on /dev/sda2 213G 39G 174G 19% /
Is it $TMP itself that is the problem perhaps?
Jonas
Can you run the following?
$ ls -l /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/rep1
Yeah, that sambamba error usually occurs when you don't have enough space in $TMP $TMP /tmp. If you run multiple pipelines or jobs at the same time your temp directory can quickly fills up.
Here it comes but if it is a memory problem, why does it happen specifically with the -no_dup_removal option activated? That makes little sense to me.
ls -l /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/rep1 total 20G -rw-rw-r-- 1 jonas sigvardsson 12G Jan 10 21:55 NALM6_index_1.trim.bam -rw-rw-r-- 1 jonas sigvardsson 2.4M Jan 10 22:12 NALM6_index_1.trim.bam.bai -rw-rw-r-- 1 jonas sigvardsson 4.7G Jan 10 15:01 NALM6_index_1.trim.fastq.gz -rw-rw-r-- 1 jonas sigvardsson 2.7G Jan 11 01:08 NALM6_index_1.trim.filt.bam -rw-rw-r-- 1 jonas sigvardsson 6.2M Jan 11 01:08 NALM6_index_1.trim.filt.bam.bai -rw-rw-r-- 1 jonas sigvardsson 161M Jan 11 01:16 NALM6_index_1.trim.filt.tagAlign.gz -rw-rw-r-- 1 jonas sigvardsson 161M Jan 11 01:18 NALM6_index_1.trim.filt.tn5.tagAlign.gz
-no_dup_removal
was actually added for our lab's internal use, we cannot guarantee that the pipeline works with it for your sample. If you tried it because you got error in deduping step, please remove it. Error in the deduping step is clearly a disk space (or memory) problem. Please see https://github.com/biod/sambamba/issues/218
Please define $TMP $TMPDIR in your ~/.bashrc to somewhere fast/local/big storage and try again. About the second error (spr). I am not sure... Can you remove all *.tagAlign.gz
in the output directory and try again? Please run 1 pipeline when your cluster is idle.
$ cd [WORK_DIR]
$ find -name '*.tagAlign.gz' -delete
Thanks, I'll try that and let you know how it worked out. It is very much appreciated that you take this time to help out.
Jonas
I understand that it may be a bit tricky to get -no_dup_removal to work if it was specifically designed for your lab but I think it would be a great feature to have, especially when running single-end data and this is where it seems to break down with that option. It seems to work well with paired-end data and the option enabled however. I have set both $TMP and $TMPDIR so I am fairly sure memory isn't the problem but when I run the command with the following option.
Anyway, I ran the $ find -name '*.tagAlign.gz' -delete
and tried to re-run and got the same error:
`shuf: ‘/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/rep1/NALM6_index_1.trim.filt.tn5.tagAlign.gz’: end of file Task failed: Program & line : '/mnt/data/bioinfo_tools_and_refs/bioinfo_tools/kundajelab_ENCODE_atac_dnase_pipeline/atac_dnase_pipelines/modules/postalign_bed.bds', line 152 Task Name : 'spr rep1' Task ID : 'atac.bds.20180126_084255_226_parallel_44/task.postalign_bed.spr_rep1.line_152.id_16' Task PID : '41253' Task hint : 'if [[ -f $(which conda) && $(conda env list | grep bds_atac | wc -l) != "0" ]]; then source activate bds_atac; sleep 5; fi; export PATH=/mnt/data/bio' Task resources : 'cpus: -1 mem: -1.0 B wall-timeout: 8640000' State : 'ERROR' Dependency state : 'ERROR' Retries available : '1' Input files : '[/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/rep1/NALM6_index_1.trim.filt.tn5.tagAlign.gz]' Output files : '[/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/pseudo_reps/rep1/pr1/NALM6_index_1.trim.filt.tn5.pr1.tagAlign.gz, /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/pseudo_reps/rep1/pr2/NALM6_index_1.trim.filt.tn5.pr2.tagAlign.gz]' Script file : '/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/atac.bds.20180126_084255_226_parallel_44/task.postalign_bed.spr_rep1.line_152.id_16.sh' Exit status : '1' StdErr (10 lines) : shuf: ‘/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/rep1/NALM6_index_1.trim.filt.tn5.tagAlign.gz’: end of file
Fatal error: /mnt/data/bioinfo_tools_and_refs/bioinfo_tools/kundajelab_ENCODE_atac_dnase_pipeline/atac_dnase_pipelines/atac.bds, line 662, pos 4. Task/s failed.`
It looks to me like the file starts to generate but then is terminated prematurely which I assume the EOF error comes from. I would expect the tagAlign-files to be 2-3 times the size of this when the duplicates are not removed.
ls -l rep1/ total 23G -rw-rw-r-- 1 jonas sigvardsson 12G Jan 25 23:25 NALM6_index_1.trim.bam -rw-rw-r-- 1 jonas sigvardsson 2.4M Jan 25 23:29 NALM6_index_1.trim.bam.bai -rw-rw-r-- 1 jonas sigvardsson 2.8G Jan 26 08:34 NALM6_index_1.trim.dupmark.bam -rw-rw-r-- 1 jonas sigvardsson 4.7G Jan 25 17:27 NALM6_index_1.trim.fastq.gz -rw-rw-r-- 1 jonas sigvardsson 2.7G Jan 26 01:52 NALM6_index_1.trim.filt.bam -rw-rw-r-- 1 jonas sigvardsson 6.2M Jan 26 01:52 NALM6_index_1.trim.filt.bam.bai -rw-rw-r-- 1 jonas sigvardsson 161M Jan 26 08:49 NALM6_index_1.trim.filt.tagAlign.gz -rw-rw-r-- 1 jonas sigvardsson 161M Jan 26 08:52 NALM6_index_1.trim.filt.tn5.tagAlign.gz -rw-rw-r-- 1 jonas sigvardsson 6.2M Jan 26 08:40 NALM6_index_1.trim.nodup.bam.bai
Let's check if that tagAlign file is good.
$ OUT_DIR=/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se
$ TA=$OUT_DIR/align/rep1/NALM6_index_1.trim.filt.tn5.tagAlign.gz
$ zcat $TA | wc -l
$ zcat $TA | head
$ zcat $TA | tail
$ tail -n10000 $OUT_DIR/qc/*/*.qc
Here is one of the samples in the triplicates but the other look very similar.
`jonas@atlas /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/rep1 $ zcat $TA | wc -l 69938309
jonas@atlas /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/rep1 $ zcat $TA | head chr1 10456 10526 N 1000 + chr1 11409 11446 N 1000 + chr1 15185 15250 N 1000 - chr1 26256 26290 N 1000 + chr1 40876 40947 N 1000 + chr1 40884 40955 N 1000 + chr1 40930 41001 N 1000 - chr1 41047 41116 N 1000 - chr1 41110 41182 N 1000 + chr1 41381 41426 N 1000 +
jonas@atlas /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/rep1 $ zcat $TA | tail chrM 16539 16571 N 1000 + chrM 16539 16571 N 1000 + chrM 16539 16571 N 1000 + chrM 16539 16569 N 1000 + chrM 16539 16571 N 1000 + chrM 16539 16571 N 1000 + chrM 16539 16571 N 1000 + chrM 16539 16571 N 1000 + chrM 16539 16571 N 1000 + chrM 16539 16571 N 1000 +
jonas@atlas /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/rep1 $ tail -n10000 $OUT_DIR/qc//.qc tail: cannot open '/qc//.qc' for reading: No such file or directory
jonas@atlas /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/rep1 $ OUT_DIR=/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/
jonas@atlas /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/rep1 $ tail -n10000 $OUT_DIR/qc//.qc==> /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se//qc/rep1/NALM6_index_1.trim.flagstat.qc <== 310113958 + 0 in total (QC-passed reads + QC-failed reads) 191843245 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 307957552 + 0 mapped (99.30%:N/A) 0 + 0 paired in sequencing 0 + 0 read1 0 + 0 read2 0 + 0 properly paired (N/A:N/A) 0 + 0 with itself and mate mapped 0 + 0 singletons (N/A:N/A) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)
==> /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se//qc/rep2/NALM6_index_3.trim.flagstat.qc <== 433469132 + 0 in total (QC-passed reads + QC-failed reads) 270786919 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 431225345 + 0 mapped (99.48%:N/A) 0 + 0 paired in sequencing 0 + 0 read1 0 + 0 read2 0 + 0 properly paired (N/A:N/A) 0 + 0 with itself and mate mapped 0 + 0 singletons (N/A:N/A) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5)
==> /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se//qc/rep3/NALM6_index_5.trim.flagstat.qc <== 392661156 + 0 in total (QC-passed reads + QC-failed reads) 244931921 + 0 secondary 0 + 0 supplementary 0 + 0 duplicates 390512774 + 0 mapped (99.45%:N/A) 0 + 0 paired in sequencing 0 + 0 read1 0 + 0 read2 0 + 0 properly paired (N/A:N/A) 0 + 0 with itself and mate mapped 0 + 0 singletons (N/A:N/A) 0 + 0 with mate mapped to a different chr 0 + 0 with mate mapped to a different chr (mapQ>=5) `
I know this is outside the scope of this thread but I have a more mundane and hopefully easier question to answer. I have been trying to play around a bit with different -mapq_thresh
scores. Even if I set it to -mapq_thresh 10
I still both get a qc report on a value 30 and it seems like the bam file is filtered on this value as well. Am I do something wrong when giving it this option. Here is an example of that command: bds /mnt/data/bioinfo_tools_and_refs/bioinfo_tools/kundajelab_ENCODE_atac_dnase_pipeline/atac_dnase_pipelines/atac.bds -species mm10 -enable_idr -no_xcor -auto_detect_adapter -mapq_thresh 10 -multimapping 4 -rm_chr_from_tag mito -nth 24 -fastq1_1 R11.fastq.gz -fastq1_2 R12.fastq.gz -fastq2_1 R21.fastq.gz -fastq2_2 R22.fastq.gz
For multimapping reads, mapq_thresh is fixed at 30. Sorry, I will add this
to README. For the EOF error, can you extract the actual command line for
the shuf
command? You can get it in the log file where you found the
following error message.
`shuf: ‘/mnt/data/common/jonas/atacseq/NALM6_ATACENCODE pipeline/out_NALM6_no_dup_rem_se/align/rep1/NALM6_index_1.trim.filt.tn5.tagAlign.gz’: end of file Task failed: Program & line : '/mnt/data/bioinfo_tools_and_refs/bioinfotools/kundajelab ENCODE_atac_dnase_pipeline/atac_dnase_pipelines/modules/postalign_bed.bds', line 152 Task Name : 'spr rep1' Task ID : 'atac.bds.20180126_084255_226_parallel44/task.postalign bed.spr_rep1.line_152.id_16' Task PID : '41253' Task hint : 'if [[ -f $(which conda) && $(conda env list | grep bds_atac | wc -l) != "0" ]]; then source activate bds_atac; sleep 5; fi; export PATH=/mnt/data/bio' Task resources : 'cpus: -1 mem: -1.0 B wall-timeout: 8640000' State : 'ERROR' Dependency state : 'ERROR' Retries available : '1' Input files : '[/mnt/data/common/jonas/atacseq/NALM6_ATACENCODE pipeline/out_NALM6_no_dup_rem_se/align/rep1/NALM6_index_1. trim.filt.tn5.tagAlign.gz]' Output files : '[/mnt/data/common/jonas/atacseq/NALM6_ATACENCODE pipeline/out_NALM6_no_dup_rem_se/align/pseudo_reps/rep1/pr1/ NALM6_index_1.trim.filt.tn5.pr1.tagAlign.gz, /mnt/data/common/jonas/ atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_duprem se/align/pseudo_reps/rep1/pr2/NALM6_index_1.trim.filt.tn5.pr2.tagAlign.gz]' Script file : '/mnt/data/common/jonas/atacseq/NALM6_ATACENCODE pipeline/atac.bds.20180126_084255_226_parallel_44/task. postalign_bed.spr_rep1.line_152.id_16.sh' Exit status : '1' StdErr (10 lines) : shuf: ‘/mnt/data/common/jonas/atacseq/NALM6_ATACENCODE pipeline/out_NALM6_no_dup_rem_se/align/rep1/NALM6_index_1.trim.filt.tn5.tagAlign.gz’: end of file
Thanks,
Jin
On Thu, Feb 1, 2018 at 11:25 PM, Jonas notifications@github.com wrote:
I know this is outside the scope of this thread but I have a more mundane and hopefully easier question to answer. I have been trying to play around a bit with different -mapq_thresh scores. Even if I set it to -mapq_thresh 10 I still both get a qc report on a value 30 and it seems like the bam file is filtered on this value as well. Am I do something wrong when giving it this option. Here is an example of that command: bds /mnt/data/bioinfo_tools_and_refs/bioinfotools/kundajelab ENCODE_atac_dnase_pipeline/atac_dnase_pipelines/atac.bds -species mm10 -enable_idr -no_xcor -auto_detect_adapter -mapq_thresh 10 -multimapping 4 -rm_chr_from_tag mito -nth 24 -fastq1_1 R11.fastq.gz -fastq1_2 R12.fastq.gz -fastq2_1 R21.fastq.gz -fastq2_2 R22.fastq.gz
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kundajelab/atac_dnase_pipelines/issues/89#issuecomment-362507286, or mute the thread https://github.com/notifications/unsubscribe-auth/AIOd_NkDpGmlypaLEB5n-xhlpWZj_XL7ks5tQriAgaJpZM4RUby9 .
Jin,
If you provide a MAPQ threshold parameter, you should use the value the user provides for filtering and QC.
Anshul.
On Feb 8, 2018 6:50 AM, "Jin Lee" notifications@github.com wrote:
For multimapping reads, mapq_thresh is fixed at 30. Sorry, I will add this to README. For the EOF error, can you extract the actual command line for the
shuf
command? You can get it in the log file where you found the following error message.`shuf: ‘/mnt/data/common/jonas/atacseq/NALM6_ATACENCODE pipeline/out_NALM6_no_dup_rem_se/align/rep1/NALM6_index_1. trim.filt.tn5.tagAlign.gz’: end of file Task failed: Program & line : '/mnt/data/bioinfo_toolsand refs/bioinfotools/kundajelab ENCODE_atac_dnase_pipeline/atac_dnase_pipelines/modules/ postalign_bed.bds', line 152 Task Name : 'spr rep1' Task ID : 'atac.bds.20180126_084255_226_parallel44/task.postalign bed.spr_rep1.line_152.id_16' Task PID : '41253' Task hint : 'if [[ -f $(which conda) && $(conda env list | grep bds_atac | wc -l) != "0" ]]; then source activate bds_atac; sleep 5; fi; export PATH=/mnt/data/bio' Task resources : 'cpus: -1 mem: -1.0 B wall-timeout: 8640000' State : 'ERROR' Dependency state : 'ERROR' Retries available : '1' Input files : '[/mnt/data/common/jonas/atacseq/NALM6_ATACENCODE pipeline/out_NALM6_no_dup_rem_se/align/rep1/NALM6_index_1. trim.filt.tn5.tagAlign.gz]' Output files : '[/mnt/data/common/jonas/atacseq/NALM6_ATACENCODE pipeline/out_NALM6_no_dup_rem_se/align/pseudo_reps/rep1/pr1/ NALM6_index_1.trim.filt.tn5.pr1.tagAlign.gz, /mnt/data/common/jonas/ atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_duprem se/align/pseudo_reps/rep1/pr2/NALM6_index_1.trim.filt.tn5. pr2.tagAlign.gz]' Script file : '/mnt/data/common/jonas/atacseq/NALM6_ATACENCODE pipeline/atac.bds.20180126_084255_226_parallel_44/task. postalign_bed.spr_rep1.line_152.id_16.sh' Exit status : '1' StdErr (10 lines) : shuf: ‘/mnt/data/common/jonas/atacseq/NALM6_ATACENCODE pipeline/out_NALM6_no_dup_rem_se/align/rep1/NALM6_index_1. trim.filt.tn5.tagAlign.gz’: end of file
Thanks,
Jin
On Thu, Feb 1, 2018 at 11:25 PM, Jonas notifications@github.com wrote:
I know this is outside the scope of this thread but I have a more mundane and hopefully easier question to answer. I have been trying to play around a bit with different -mapq_thresh scores. Even if I set it to -mapq_thresh 10 I still both get a qc report on a value 30 and it seems like the bam file is filtered on this value as well. Am I do something wrong when giving it this option. Here is an example of that command: bds /mnt/data/bioinfo_tools_and_refs/bioinfotools/kundajelab ENCODE_atac_dnase_pipeline/atac_dnase_pipelines/atac.bds -species mm10 -enable_idr -no_xcor -auto_detect_adapter -mapq_thresh 10 -multimapping 4 -rm_chr_from_tag mito -nth 24 -fastq1_1 R11.fastq.gz -fastq1_2 R12.fastq.gz -fastq2_1 R21.fastq.gz -fastq2_2 R22.fastq.gz
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kundajelab/atac_dnase_pipelines/issues/89# issuecomment-362507286, or mute the thread https://github.com/notifications/unsubscribe-auth/AIOd_NkDpGmlypaLEB5n- xhlpWZj_XL7ks5tQriAgaJpZM4RUby9 .
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kundajelab/atac_dnase_pipelines/issues/89#issuecomment-364134686, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI7ETSBRkXeMntFSQaslc65ZR2bh6oCks5tSwmqgaJpZM4RUby9 .
`# SYS command. line 154
if [[ -f $(which conda) && $(conda env list | grep bds_atac | wc -l) != "0" ]]; then source activate bds_atac; sleep 5; fi; export PATH=/mnt/data/bioinfo_tools_and_refs/bioinfo_tools/kundajelab_ENCODE_atac_dnase_pipeline/atac_dnase_pipelines/.:/mnt/data/bioinfo_tools_and_refs/bioinfo_tools/kundajelab_ENCODE_atac_dnase_pipeline/atac_dnase_pipelines/modules:/mnt/data/bioinfo_tools_and_refs/bioinfo_tools/kundajelab_ENCODE_atac_dnase_pipeline/atac_dnase_pipelines/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)
nlines=$( zcat /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/rep1/NALM6_index_1.trim.filt.tn5.tagAlign.gz | wc -l )
nlines=$(( (nlines + 1) / 2 ))
zcat /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/rep1/NALM6_index_1.trim.filt.tn5.tagAlign.gz | shuf --random-source=/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/rep1/NALM6_index_1.trim.filt.tn5.tagAlign.gz | split -d -l $((nlines)) - /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/pseudo_reps/rep1/pr1/NALM6_index_1.trim.filt.tn5.
gzip -nc /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/pseudo_reps/rep1/pr1/NALM6_index_1.trim.filt.tn5.00 > /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/pseudo_reps/rep1/pr1/NALM6_index_1.trim.filt.tn5.pr1.tagAlign.gz
rm -f /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/pseudo_reps/rep1/pr1/NALM6_index_1.trim.filt.tn5.00
gzip -nc /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/pseudo_reps/rep1/pr1/NALM6_index_1.trim.filt.tn5.01 > /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/pseudo_reps/rep1/pr2/NALM6_index_1.trim.filt.tn5.pr2.tagAlign.gz
rm -f /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/pseudo_reps/rep1/pr1/NALM6_index_1.trim.filt.tn5.01
TASKTIME=$[$(date +%s)-${STARTTIME}]; echo "Task has finished (${TASKTIME} seconds)."; sleep 0`
Input files: `
/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/rep1/NALM6_index_1.trim.filt.tn5.tagAlign.gz |
---|
Output files: `
/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/pseudo_reps/rep1/pr1/NALM6_index_1.trim.filt.tn5.pr1.tagAlign.gz /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/pseudo_reps/rep1/pr2/NALM6_index_1.trim.filt.tn5.pr2.tagAlign.gz |
---|
`
`
@akundaje sorry we already discussed this issue in the ataqc channel on slack. @jonasungerback let me correct this. when multimapping>0 mapq_thresh is ignored (not fixed at 30).
Please run the following and post outputs here.
if [[ -f $(which conda) && $(conda env list | grep bds_atac | wc -l) != "0" ]]; then source activate bds_atac; sleep 5; fi; export PATH=/mnt/data/bioinfo_tools_and_refs/bioinfo_tools/kundajelab_ENCODE_atac_dnase_pipeline/atac_dnase_pipelines/.:/mnt/data/bioinfo_tools_and_refs/bioinfo_tools/kundajelab_ENCODE_atac_dnase_pipeline/atac_dnase_pipelines/modules:/mnt/data/bioinfo_tools_and_refs/bioinfo_tools/kundajelab_ENCODE_atac_dnase_pipeline/atac_dnase_pipelines/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)
nlines=$( zcat /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/rep1/NALM6_index_1.trim.filt.tn5.tagAlign.gz | wc -l )
nlines=$(( (nlines + 1) / 2 ))
zcat /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/rep1/NALM6_index_1.trim.filt.tn5.tagAlign.gz | shuf --random-source=/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/rep1/NALM6_index_1.trim.filt.tn5.tagAlign.gz | split -d -l $((nlines)) - /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/pseudo_reps/rep1/pr1/NALM6_index_1.trim.filt.tn5.
Thanks then I know that those two cannot be combined.
What comes up when I put the above commands in is the following:
shuf: ‘/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/rep1/NALM6_index_1.trim.filt.tn5.tagAlign.gz’: end of file
Please run the following:
if [[ -f $(which conda) && $(conda env list | grep bds_atac | wc -l) != "0" ]]; then source activate bds_atac; sleep 5; fi; export PATH=/mnt/data/bioinfo_tools_and_refs/bioinfo_tools/kundajelab_ENCODE_atac_dnase_pipeline/atac_dnase_pipelines/.:/mnt/data/bioinfo_tools_and_refs/bioinfo_tools/kundajelab_ENCODE_atac_dnase_pipeline/atac_dnase_pipelines/modules:/mnt/data/bioinfo_tools_and_refs/bioinfo_tools/kundajelab_ENCODE_atac_dnase_pipeline/atac_dnase_pipelines/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)
nlines=$( zcat /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/rep1/NALM6_index_1.trim.filt.tn5.tagAlign.gz | wc -l )
nlines=$(( (nlines + 1) / 2 ))
echo $nlines
ls -l /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/rep1/NALM6_index_1.trim.filt.tn5.tagAlign.gz
ls -l /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/*
Again, thank you for your effort. What are we looking for and maybe I can be of more assistance. Here's the output from the commands:
echo $nlines 34969155
`ls -l /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/rep1/NALM6_index_1.trim.filt.tn5.tagAlign.gz
-rw-rw-r-- 1 jonas sigvardsson 161M Jan 30 12:41 /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/rep1/NALM6_index_1.trim.filt.tn5.tagAlign.gz
(bds_atac)
jonas@atlas ~
$
(bds_atac)
jonas@atlas ~
$
(bds_atac)
jonas@atlas ~
$ ls -l /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/*
/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/pseudo_reps:
total 0
drwxrwxr-x 4 jonas sigvardsson 40 Jan 30 12:41 rep1
drwxrwxr-x 4 jonas sigvardsson 40 Jan 30 14:06 rep2
drwxrwxr-x 4 jonas sigvardsson 40 Jan 30 13:34 rep3
/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/rep1: total 20G -rw-rw-r-- 1 jonas sigvardsson 12G Jan 30 11:25 NALM6_index_1.trim.bam -rw-rw-r-- 1 jonas sigvardsson 2.4M Jan 30 11:29 NALM6_index_1.trim.bam.bai -rw-rw-r-- 1 jonas sigvardsson 4.7G Jan 30 08:32 NALM6_index_1.trim.fastq.gz -rw-rw-r-- 1 jonas sigvardsson 2.7G Jan 30 12:31 NALM6_index_1.trim.filt.bam -rw-rw-r-- 1 jonas sigvardsson 6.2M Jan 30 12:31 NALM6_index_1.trim.filt.bam.bai -rw-rw-r-- 1 jonas sigvardsson 161M Jan 30 12:39 NALM6_index_1.trim.filt.tagAlign.gz -rw-rw-r-- 1 jonas sigvardsson 161M Jan 30 12:41 NALM6_index_1.trim.filt.tn5.tagAlign.gz
/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/rep2: total 27G -rw-rw-r-- 1 jonas sigvardsson 16G Jan 30 12:31 NALM6_index_3.trim.bam -rw-rw-r-- 1 jonas sigvardsson 2.6M Jan 30 12:37 NALM6_index_3.trim.bam.bai -rw-rw-r-- 1 jonas sigvardsson 6.7G Jan 30 08:49 NALM6_index_3.trim.fastq.gz -rw-rw-r-- 1 jonas sigvardsson 3.6G Jan 30 13:53 NALM6_index_3.trim.filt.bam -rw-rw-r-- 1 jonas sigvardsson 6.3M Jan 30 13:53 NALM6_index_3.trim.filt.bam.bai -rw-rw-r-- 1 jonas sigvardsson 170M Jan 30 14:03 NALM6_index_3.trim.filt.tagAlign.gz -rw-rw-r-- 1 jonas sigvardsson 171M Jan 30 14:06 NALM6_index_3.trim.filt.tn5.tagAlign.gz
/mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/rep3: total 24G -rw-rw-r-- 1 jonas sigvardsson 15G Jan 30 12:06 NALM6_index_5.trim.bam -rw-rw-r-- 1 jonas sigvardsson 2.5M Jan 30 12:12 NALM6_index_5.trim.bam.bai -rw-rw-r-- 1 jonas sigvardsson 5.9G Jan 30 08:42 NALM6_index_5.trim.fastq.gz -rw-rw-r-- 1 jonas sigvardsson 3.2G Jan 30 13:22 NALM6_index_5.trim.filt.bam -rw-rw-r-- 1 jonas sigvardsson 6.2M Jan 30 13:22 NALM6_index_5.trim.filt.bam.bai -rw-rw-r-- 1 jonas sigvardsson 177M Jan 30 13:31 NALM6_index_5.trim.filt.tagAlign.gz -rw-rw-r-- 1 jonas sigvardsson 178M Jan 30 13:34 NALM6_index_5.trim.filt.tn5.tagAlign.gz`
This is weird, end of file
error in shuf
occurs only when a file used for --random-source
is too short (< several bytes). Can you send me /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out_NALM6_no_dup_rem_se/align/rep1/NALM6_index_1.trim.filt.tn5.tagAlign.gz
? You can upload it to your google drive or dropbox and share it with us.
I apologize for late answer, it's been a busy week. Folder has been created on Dropbox and files uploaded.
@jonasungerback sorry for long delay. please get the latest pipeline and try again with a -no_random_source
flag.
Thank you! Sadly the same problem persists.
@jonasungerback please check if your modules/postalign_bed.bds
has no_random_source
variable.
It is not there. is that good or bad?
@jonasungerback did you git pull
the latest fix commit? did you run with -no_random_source
?
Now I have and it looks like it is working. I will run some more thorough tests but thank you so very much for all your kind help.
Hi, first and foremost, thank you so much for this pipeline. Standardization is much needed in the ATAC-seq world. I am however encountering an issue in the qc-step that I am too much a beginner for to locate. I did install the pipeline with the --recursive option and as long as I am giving the -no_xcor option it starts and runs (it fails with it and I am not sure why) into the ataqc steps but here I am encountering the error in the title. The part of the error message looks like this:
I am also attaching the zipped report.
Thank you so much in advance! Best, Jonas
ERROR ERROR
2018-01-05 09:36:39 2018-01-05 09:36:39 00:00:00 00:00:-1 100 days /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out/align/rep1/NALM6_index_1.trim.fastq.gz /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out/align/rep1/NALM6_index_1.trim.bam /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out/qc/rep1/NALM6_index_1.trim.align.log /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out/qc/rep1/NALM6_index_1.trim.nodup.pbc.qc /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out/qc/rep1/NALM6_index_1.trim.dup.qc /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out/align/rep1/NALM6_index_1.trim.nodup.bam /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out/align/rep1/NALM6_index_1.trim.nodup.tn5.tagAlign.gz /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out/signal/macs2/rep1/NALM6_index_1.trim.nodup.tn5.pf.pval.signal.bigwig /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out/peak/macs2/idr/pseudo_reps/rep1/NALM6_ATAC_ENCODE_pipeline_rep1-pr.IDR0.1.filt.narrowPeak.gz /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out/qc/rep1/NALM6_index_1.trim_qc.html /mnt/data/common/jonas/atacseq/NALM6_ATAC_ENCODE_pipeline/out/qc/rep1/NALM6_index_1.trim_qc.txt atac.bds.20180104_113531_124.report.html.zip