kundajelab / atac_dnase_pipelines

ATAC-seq and DNase-seq processing pipeline
BSD 3-Clause "New" or "Revised" License
160 stars 81 forks source link

Unknown error while launching Atac-Seq pipeline #92

Closed lpalomerol closed 6 years ago

lpalomerol commented 6 years ago

Hello,

this is my first time with this pipeline and I am got stucked launching this first one. Probably code must be ok, and the problem would be mine... but I cannot understand what is happening, because (until I can see) there is not specific trace.

Could you help me, please? I've been checking the cause during two days and I do not understand what is happening.

I would appreciate any help!

This is the launching trace:

`

== git info Latest git commit : a9389cb01c47c4479101f1963018515412a77166 (Thu Jan 11 04:23:30 2018) Reading parameters from section (default) in file(/home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/../../../../../../opt/atac_dnase_pipelines/default.env)...

== configuration file info Hostname : lpalomero.idibelll.org Configuration file : Environment file : /home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/../../../../../../opt/atac_dnase_pipelines/default.env

== parallelization info No parallel jobs : false Maximum # threads : 32

== cluster/system info Walltime (general) : 5h50m Max. memory (general) : 7G Force to use a system : local Process priority (niceness) : 0 Retiral for failed tasks : 0 Submit tasks to a cluster queue : Unlimited cluster mem./walltime : false Java temporary directory : ${TMPDIR}

Info: Environments module not found on your system (e.g. /etc/profile.d/modules.sh). Ignoring shell env. parameters like '-mod'.

== shell environment info Conda env. : bds_atac Conda env. for python3 : bds_atac_py3 Conda bin. directory :

Shell cmd. for init. : if [[ -f $(which conda) && $(conda env list | grep bds_atac | wc -l) != "0" ]]; then source activate bds_atac; sleep 5; fi; export PATH=/home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/../../../../../../opt/atac_dnase_pipelines/.:/home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/../../../../../../opt/atac_dnase_pipelines/modules:/home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/../../../../../../opt/atac_dnase_pipelines/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)

Shell cmd. for init.(py3) : if [[ -f $(which conda) && $(conda env list | grep bds_atac_py3 | wc -l) != "0" ]]; then source activate bds_atac_py3; sleep 5; fi; export PATH=/home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/../../../../../../opt/atac_dnase_pipelines/.:/home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/../../../../../../opt/atac_dnase_pipelines/modules:/home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/../../../../../../opt/atac_dnase_pipelines/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)

Shell cmd. for fin. : TASKTIME=$[$(date +%s)-${STARTTIME}]; echo "Task has finished (${TASKTIME} seconds)."; sleep 0

Cluster task min. len. : 60

Cluster task delay : 0

== output directory/title info Output dir. : /home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/out Title (prefix) : 20180206.run.atacseq Reading parameters from section (default) in file(/home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/../../../../../../opt/atac_dnase_pipelines/default.env)... Reading parameters from section (hg38) in file(/home/idibell/Documentos/data/genome/bds_atac_species.conf)...

== species settings Species : hg38 Species file : /home/idibell/Documentos/data/genome/bds_atac_species.conf

Species name (WashU browser) : hg38 Ref. genome seq. fasta : /home/idibell/Documentos/data/genome/hg38/GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta Chr. sizes file : /home/idibell/Documentos/data/genome/hg38/hg38.chrom.sizes Black list bed : /home/idibell/Documentos/data/genome/hg38/hg38.blacklist.bed.gz Ref. genome seq. dir. :

== ENCODE accession settings ENCODE experiment accession : ENCODE award RFA : ENCODE assay category : ENCODE assay title : ENCODE award : ENCODE lab : ENCODE assembly genome : ENCODE alias prefix : KLAB_PIPELINE ENCODE alias suffix :

== report settings URL root for output directory : Genome coord. for browser tracks :

== align multimapping settings

alignments reported for multimapping : 0

== align bowtie2 settings Bowtie2 index : /home/idibell/Documentos/data/genome/hg38/bowtie2_index/GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta Replacement --score-min for bowtie2 : Walltime (bowtie2) : 47h Max. memory (bowtie2) : 12G Extra param. (bowtie2) :

== adapter trimmer settings Maximum allowed error rate for cutadapt : 0.10 Minimum trim. length for cutadapt -m : 5 Walltime (adapter trimming) : 23h Max. memory (adapter trimming) : 12G

== postalign bam settings MAPQ reads rm thresh. : 30 Rm. tag reads with str. : chrM No dupe removal in filtering raw bam : false Walltime (bam filter) : 23h Max. memory (bam filter) : 12G Dup marker : picard Use sambamba markdup (instead of picard) : false

== postalign bed/tagalign settings Max. memory for UNIX shuf : 12G

== postalign cross-corr. analysis settings Max. memory for UNIX shuf : 12G User-defined cross-corr. peak strandshift : -1 Extra parameters for cross-corr. analysis : Max. memory for cross-corr. analysis : 15G

== callpeak macs2 settings Genome size (hs,mm) : hs Walltime (macs2) : 23h Max. memory (macs2) : 15G Cap number of peaks (macs2) : 300K Extra parameters for macs2 callpeak :

== callpeak naiver overlap settings Bedtools intersect -nonamecheck : false

== IDR settings Append IDR threshold to IDR out_dir : false

== ATAQC settings TSS enrichment bed : /home/idibell/Documentos/data/genome/hg38/ataqc/hg38_gencode_tss_unique.bed.gz DNase bed for ataqc : /home/idibell/Documentos/data/genome/hg38/ataqc/reg2map_honeybadger2_dnase_all_p10_ucsc.hg19_to_hg38.bed.gz Promoter bed for ataqc : /home/idibell/Documentos/data/genome/hg38/ataqc/reg2map_honeybadger2_dnase_prom_p2.hg19_to_hg38.bed.gz Enhancer bed for ataqc : /home/idibell/Documentos/data/genome/hg38/ataqc/reg2map_honeybadger2_dnase_enh_p2.hg19_to_hg38.bed.gz Reg2map for ataqc : /home/idibell/Documentos/data/genome/hg38/ataqc/hg38_dnase_avg_fseq_signal_formatted.txt.gz Reg2map_bed for ataqc : /home/idibell/Documentos/data/genome/hg38/ataqc/hg38_celltype_compare_subsample.bed.gz Roadmap metadata for ataqc : /home/idibell/Documentos/data/genome/hg38/ataqc/hg38_dnase_avg_fseq_signal_metadata.txt Max. memory for ATAQC : 20G Walltime for ATAQC : 47h

== atac pipeline settings Type of pipeline : atac-seq Align only : false

reads to subsample replicates (0 if no subsampling) : 0

reads to subsample for cross-corr. analysis : 25000000

No pseudo replicates : false No ATAQC (advanced QC report) : false No Cross-corr. analysis : false Use CSEM for alignment : false Smoothing window for MACS2 : 150 DNase Seq : false IDR threshold : 0.1 Force to use ENCODE3 parameter set : false Force to use ENCODE parameter set : false Disable genome browser tracks : false p-val thresh. for overlapped peaks : 0.01 MACS2 p-val thresh. for peaks : 0.01 MACS2 p-val thresh. for BIGWIGs : 0.01 Enable IDR on called peaks : false Automatically find/trim adapters : false

== checking atac parameters ...

Checking parameters and data files for ATAQC.

== checking adapters to be trimmed ... Rep1 R1 adapters (PE) : 00: no adapter specified. Rep1 R2 adapters (PE) : 00: no adapter specified.

== checking input files ...

Rep1 fastq (PE) : /home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/../../../../../data/datasets/fastq.hakemann/468_plus_Crisper_169_S7_L002_R1_001.fastq.gz /home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/../../../../../data/datasets/fastq.hakemann/468_plus_Crisper_S8_L002_R1_001.fastq.gz Distributing 32 to ...

`

And here is the error trace:

`

Task failed: Program & line : '../../../../../../opt/atac_dnase_pipelines/modules/align_bowtie2.bds', line 144 Task Name : 'bowtie2_PE rep1' Task ID : 'atac.bds.20180207_150019_769_parallel_41/task.align_bowtie2.bowtie2_PE_rep1.line_144.id_10' Task PID : '6837' Task hint : 'bowtie2 -X2000 --mm --local --threads 32 -x /home/idibell/Documentos/data/genome/hg38/bowtie2_index/GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta; -1 /home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/../../../../../data/datasets/fastq.hakemann/468_plu' Task resources : 'cpus: 32 mem: -1,0 B wall-timeout: 8640000' State : 'ERROR' Dependency state : 'ERROR' Retries available : '1' Input files : '[/home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/../../../../../data/datasets/fastq.hakemann/468_plus_Crisper_169_S7_L002_R1_001.fastq.gz, /home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/../../../../../data/datasets/fastq.hakemann/468_plus_Crisper_S8_L002_R1_001.fastq.gz]' Output files : '[/home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/out/align/rep1/468_plus_Crisper_169_S7_L002_R1_001.PE2SE.bam, /home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/out/qc/rep1/468_plus_Crisper_169_S7_L002_R1_001.PE2SE.align.log]' Script file : '/home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/atac.bds.20180207_150019_769_parallel_41/task.align_bowtie2.bowtie2_PE_rep1.line_144.id_10.sh' Exit status : '1' Program :

            # SYS command. line 146

             if [[ -f $(which conda) && $(conda env list | grep bds_atac | wc -l) != "0" ]]; then source activate bds_atac; sleep 5; fi;  export PATH=/home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/../../../../../../opt/atac_dnase_pipelines/.:/home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/../../../../../../opt/atac_dnase_pipelines/modules:/home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/../../../../../../opt/atac_dnase_pipelines/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)

            # SYS command. line 151

             bowtie2    -X2000 --mm --local --threads 32 -x /home/idibell/Documentos/data/genome/hg38/bowtie2_index/GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta \
                                    -1 /home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/../../../../../data/datasets/fastq.hakemann/468_plus_Crisper_169_S7_L002_R1_001.fastq.gz -2 /home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/../../../../../data/datasets/fastq.hakemann/468_plus_Crisper_S8_L002_R1_001.fastq.gz 2>/home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/out/qc/rep1/468_plus_Crisper_169_S7_L002_R1_001.PE2SE.align.log | \
                                    samtools view -Su /dev/stdin | samtools sort - /home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/out/align/rep1/468_plus_Crisper_169_S7_L002_R1_001.PE2SE

            # SYS command. line 154

             cat /home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/out/qc/rep1/468_plus_Crisper_169_S7_L002_R1_001.PE2SE.align.log

            # SYS command. line 155

             samtools index /home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/out/align/rep1/468_plus_Crisper_169_S7_L002_R1_001.PE2SE.bam

            # SYS command. line 157

             TASKTIME=$[$(date +%s)-${STARTTIME}]; echo "Task has finished (${TASKTIME} seconds)."; sleep 0
    StdErr (100000000 lines)  :
            [bam_sort_core] merging from 49 files...

Fatal error: ../../../../../../opt/atac_dnase_pipelines/atac.bds, line 789, pos 3. Task/s failed.

Creating checkpoint file: Config or command line option disabled checkpoint file creation, nothing done. Fatal error: ../../../../../../opt/atac_dnase_pipelines/atac.bds, line 426, pos 2. Task/s failed.

Creating checkpoint file: Config or command line option disabled checkpoint file creation, nothing done. `

leepc12 commented 6 years ago

Did you run with -nth 32 (32 threads for the pipeline rune)? Do you have such cpus on your system? Do you also have enough disk space (>30G) and memory (>16G) on your system? Please post a full log.

$ df -h $ free -h

Thanks,

Jin

On Wed, Feb 7, 2018 at 7:40 AM, lpalomerol notifications@github.com wrote:

Hello,

this is my first time with this pipeline and I am got stucked launching this first one. Probably code must be ok, and the problem would be mine... but I cannot understand what is happening, because (until I can see) there is not specific trace.

Could you help me, please? I've been checking the cause during two days and I do not understand what is happening.

I would appreciate any help!

This is the launching trace:

`

== git info Latest git commit : a9389cb https://github.com/kundajelab/atac_dnase_pipelines/commit/a9389cb01c47c4479101f1963018515412a77166 (Thu Jan 11 04:23:30 2018) Reading parameters from section (default) in file(/home/idibell/Documentos/ pipelines/8.pujana/20180206.anne.atac/results/20180206. run.atacseq/../../../../../../opt/atac_dnase_pipelines/default.env)...

== configuration file info Hostname : lpalomero.idibelll.org Configuration file : Environment file : /home/idibell/Documentos/pipelines/8.pujana/20180206. anne.atac/results/20180206.run.atacseq/../../../../../../ opt/atac_dnase_pipelines/default.env

== parallelization info No parallel jobs : false Maximum # threads : 32

== cluster/system info Walltime (general) : 5h50m Max. memory (general) : 7G Force to use a system : local Process priority (niceness) : 0 Retiral for failed tasks : 0 Submit tasks to a cluster queue : Unlimited cluster mem./walltime : false Java temporary directory : ${TMPDIR}

Info: Environments module not found on your system (e.g. /etc/profile.d/modules.sh). Ignoring shell env. parameters like '-mod'.

== shell environment info Conda env. : bds_atac Conda env. for python3 : bds_atac_py3 Conda bin. directory :

Shell cmd. for init. : if [[ -f $(which conda) && $(conda env list | grep bds_atac | wc -l) != "0" ]]; then source activate bds_atac; sleep 5; fi; export PATH=/home/idibell/Documentos/pipelines/8.pujana/20180206. anne.atac/results/20180206.run.atacseq/../../../../../../ opt/atac_dnase_pipelines/.:/home/idibell/Documentos/ pipelines/8.pujana/20180206.anne.atac/results/20180206. run.atacseq/../../../../../../opt/atac_dnase_pipelines/ modules:/home/idibell/Documentos/pipelines/8.pujana/ 20180206.anne.atac/results/20180206.run.atacseq/../../../ ../../../opt/atac_dnase_pipelines/utils:${PATH}:/bin:/ usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)

Shell cmd. for init.(py3) : if [[ -f $(which conda) && $(conda env list | grep bds_atac_py3 | wc -l) != "0" ]]; then source activate bds_atac_py3; sleep 5; fi; export PATH=/home/idibell/Documentos/ pipelines/8.pujana/20180206.anne.atac/results/20180206. run.atacseq/../../../../../../opt/atac_dnase_pipelines/.:/ home/idibell/Documentos/pipelines/8.pujana/20180206. anne.atac/results/20180206.run.atacseq/../../../../../../ opt/atac_dnase_pipelines/modules:/home/idibell/ Documentos/pipelines/8.pujana/20180206.anne.atac/results/ 20180206.run.atacseq/../../../../../../opt/atacdnase pipelines/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)

Shell cmd. for fin. : TASKTIME=$[$(date +%s)-${STARTTIME}]; echo "Task has finished (${TASKTIME} seconds)."; sleep 0

Cluster task min. len. : 60

Cluster task delay : 0

== output directory/title info Output dir. : /home/idibell/Documentos/pipelines/8.pujana/20180206. anne.atac/results/20180206.run.atacseq/out Title (prefix) : 20180206.run.atacseq Reading parameters from section (default) in file(/home/idibell/Documentos/ pipelines/8.pujana/20180206.anne.atac/results/20180206. run.atacseq/../../../../../../opt/atac_dnase_pipelines/default.env)... Reading parameters from section (hg38) in file(/home/idibell/Documentos/ data/genome/bds_atac_species.conf)...

== species settings Species : hg38 Species file : /home/idibell/Documentos/data/genome/bds_atac_species.conf

Species name (WashU browser) : hg38 Ref. genome seq. fasta : /home/idibell/Documentos/data/ genome/hg38/GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta Chr. sizes file : /home/idibell/Documentos/data/ genome/hg38/hg38.chrom.sizes Black list bed : /home/idibell/Documentos/data/genome/hg38/hg38.blacklist. bed.gz Ref. genome seq. dir. :

== ENCODE accession settings ENCODE experiment accession : ENCODE award RFA : ENCODE assay category : ENCODE assay title : ENCODE award : ENCODE lab : ENCODE assembly genome : ENCODE alias prefix : KLAB_PIPELINE ENCODE alias suffix :

== report settings URL root for output directory : Genome coord. for browser tracks :

== align multimapping settings alignments reported for multimapping : 0

== align bowtie2 settings Bowtie2 index : /home/idibell/Documentos/data/genome/hg38/bowtie2_index/ GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta Replacement --score-min for bowtie2 : Walltime (bowtie2) : 47h Max. memory (bowtie2) : 12G Extra param. (bowtie2) :

== adapter trimmer settings Maximum allowed error rate for cutadapt : 0.10 Minimum trim. length for cutadapt -m : 5 Walltime (adapter trimming) : 23h Max. memory (adapter trimming) : 12G

== postalign bam settings MAPQ reads rm thresh. : 30 Rm. tag reads with str. : chrM No dupe removal in filtering raw bam : false Walltime (bam filter) : 23h Max. memory (bam filter) : 12G Dup marker : picard Use sambamba markdup (instead of picard) : false

== postalign bed/tagalign settings Max. memory for UNIX shuf : 12G

== postalign cross-corr. analysis settings Max. memory for UNIX shuf : 12G User-defined cross-corr. peak strandshift : -1 Extra parameters for cross-corr. analysis : Max. memory for cross-corr. analysis : 15G

== callpeak macs2 settings Genome size (hs,mm) : hs Walltime (macs2) : 23h Max. memory (macs2) : 15G Cap number of peaks (macs2) : 300K Extra parameters for macs2 callpeak :

== callpeak naiver overlap settings Bedtools intersect -nonamecheck : false

== IDR settings Append IDR threshold to IDR out_dir : false

== ATAQC settings TSS enrichment bed : /home/idibell/Documentos/data/genome/hg38/ataqc/hg38_ gencode_tss_unique.bed.gz DNase bed for ataqc : /home/idibell/Documentos/data/ genome/hg38/ataqc/reg2map_honeybadger2_dnase_allp10 ucsc.hg19_to_hg38.bed.gz Promoter bed for ataqc : /home/idibell/Documentos/data/ genome/hg38/ataqc/reg2map_honeybadger2_dnase_prom_p2.hg19_to_hg38.bed.gz Enhancer bed for ataqc : /home/idibell/Documentos/data/ genome/hg38/ataqc/reg2map_honeybadger2_dnase_enh_p2.hg19_to_hg38.bed.gz Reg2map for ataqc : /home/idibell/Documentos/data/ genome/hg38/ataqc/hg38_dnase_avg_fseq_signal_formatted.txt.gz Reg2map_bed for ataqc : /home/idibell/Documentos/data/ genome/hg38/ataqc/hg38_celltype_compare_subsample.bed.gz Roadmap metadata for ataqc : /home/idibell/Documentos/data/ genome/hg38/ataqc/hg38_dnase_avg_fseq_signal_metadata.txt Max. memory for ATAQC : 20G Walltime for ATAQC : 47h

== atac pipeline settings Type of pipeline : atac-seq Align only : false reads to subsample replicates (0 if no subsampling) : 0 reads to subsample for cross-corr. analysis : 25000000

No pseudo replicates : false No ATAQC (advanced QC report) : false No Cross-corr. analysis : false Use CSEM for alignment : false Smoothing window for MACS2 : 150 DNase Seq : false IDR threshold : 0.1 Force to use ENCODE3 parameter set : false Force to use ENCODE parameter set : false Disable genome browser tracks : false p-val thresh. for overlapped peaks : 0.01 MACS2 p-val thresh. for peaks : 0.01 MACS2 p-val thresh. for BIGWIGs : 0.01 Enable IDR on called peaks : false Automatically find/trim adapters : false

== checking atac parameters ...

Checking parameters and data files for ATAQC.

== checking adapters to be trimmed ... Rep1 R1 adapters (PE) : 00: no adapter specified. Rep1 R2 adapters (PE) : 00: no adapter specified.

== checking input files ...

Rep1 fastq (PE) : /home/idibell/Documentos/pipelines/8.pujana/20180206. anne.atac/results/20180206.run.atacseq/../../../../../ data/datasets/fastq.hakemann/468_plus_Crisper_169_S7_L002_R1_001.fastq.gz /home/idibell/Documentos/pipelines/8.pujana/20180206. anne.atac/results/20180206.run.atacseq/../../../../../ data/datasets/fastq.hakemann/468_plus_Crisper_S8_L002_R1_001.fastq.gz Distributing 32 to ...

`

And here is the error trace:

`

Task failed: Program & line : '../../../../../../opt/atac_ dnase_pipelines/modules/align_bowtie2.bds', line 144 Task Name : 'bowtie2_PE rep1' Task ID : 'atac.bds.20180207_150019_769_parallel41/task.align bowtie2.bowtie2_PE_rep1.line_144.id_10' Task PID : '6837' Task hint : 'bowtie2 -X2000 --mm --local --threads 32 -x /home/idibell/Documentos/data/genome/hg38/bowtie2_index/ GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta; -1 /home/idibell/Documentos/pipelines/8.pujana/20180206. anne.atac/results/20180206.run.atacseq/../../../../../ data/datasets/fastq.hakemann/468_plu' Task resources : 'cpus: 32 mem: -1,0 B wall-timeout: 8640000' State : 'ERROR' Dependency state : 'ERROR' Retries available : '1' Input files : '[/home/idibell/Documentos/pipelines/8.pujana/20180206. anne.atac/results/20180206.run.atacseq/../../../../../ data/datasets/fastq.hakemann/468_plus_Crisper_169_S7_L002_R1_001.fastq.gz, /home/idibell/Documentos/pipelines/8.pujana/20180206. anne.atac/results/20180206.run.atacseq/../../../../../ data/datasets/fastq.hakemann/468_plus_Crisper_S8_L002_R1_001.fastq.gz]' Output files : '[/home/idibell/Documentos/pipelines/8.pujana/20180206. anne.atac/results/20180206.run.atacseq/out/align/rep1/ 468_plus_Crisper_169_S7_L002_R1_001.PE2SE.bam, /home/idibell/Documentos/ pipelines/8.pujana/20180206.anne.atac/results/20180206. run.atacseq/out/qc/rep1/468_plus_Crisper_169_S7_L002R1 001.PE2SE.align.log]' Script file : '/home/idibell/Documentos/pipelines/8.pujana/20180206. anne.atac/results/20180206.run.atacseq/atac.bds.20180207_ 150019_769_parallel_41/task.align_bowtie2.bowtie2_PE_rep1. line_144.id_10.sh' Exit status : '1' Program :

        # SYS command. line 146

         if [[ -f $(which conda) && $(conda env list | grep bds_atac | wc -l) != "0" ]]; then source activate bds_atac; sleep 5; fi;  export PATH=/home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/../../../../../../opt/atac_dnase_pipelines/.:/home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/../../../../../../opt/atac_dnase_pipelines/modules:/home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/../../../../../../opt/atac_dnase_pipelines/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)

        # SYS command. line 151

         bowtie2    -X2000 --mm --local --threads 32 -x /home/idibell/Documentos/data/genome/hg38/bowtie2_index/GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta \
                                -1 /home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/../../../../../data/datasets/fastq.hakemann/468_plus_Crisper_169_S7_L002_R1_001.fastq.gz -2 /home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/../../../../../data/datasets/fastq.hakemann/468_plus_Crisper_S8_L002_R1_001.fastq.gz 2>/home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/out/qc/rep1/468_plus_Crisper_169_S7_L002_R1_001.PE2SE.align.log | \
                                samtools view -Su /dev/stdin | samtools sort - /home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/out/align/rep1/468_plus_Crisper_169_S7_L002_R1_001.PE2SE

        # SYS command. line 154

         cat /home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/out/qc/rep1/468_plus_Crisper_169_S7_L002_R1_001.PE2SE.align.log

        # SYS command. line 155

         samtools index /home/idibell/Documentos/pipelines/8.pujana/20180206.anne.atac/results/20180206.run.atacseq/out/align/rep1/468_plus_Crisper_169_S7_L002_R1_001.PE2SE.bam

        # SYS command. line 157

         TASKTIME=$[$(date +%s)-${STARTTIME}]; echo "Task has finished (${TASKTIME} seconds)."; sleep 0
StdErr (100000000 lines)  :
        [bam_sort_core] merging from 49 files...

Fatal error: ../../../../../../opt/atac_dnase_pipelines/atac.bds, line 789, pos 3. Task/s failed.

Creating checkpoint file: Config or command line option disabled checkpoint file creation, nothing done. Fatal error: ../../../../../../opt/atac_dnase_pipelines/atac.bds, line 426, pos 2. Task/s failed.

Creating checkpoint file: Config or command line option disabled checkpoint file creation, nothing done. `

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kundajelab/atac_dnase_pipelines/issues/92, or mute the thread https://github.com/notifications/unsubscribe-auth/AIOd_F_-VGfsqXNascHFq6pXc8HATCAKks5tScPtgaJpZM4R86e1 .

lpalomerol commented 6 years ago

Thanks to you, you have here the logs: df -h

` S.ficheros Tamaño Usados Disp Uso% Montado en /dev/sda4 437G 4,6G 433G 2% / devtmpfs 63G 0 63G 0% /dev tmpfs 63G 0 63G 0% /dev/shm tmpfs 63G 642M 63G 1% /run tmpfs 63G 0 63G 0% /sys/fs/cgroup /dev/sda2 2,0G 218M 1,8G 11% /boot /dev/sda1 200M 9,8M 191M 5% /boot/efi /dev/sdb1 11T 74G 11T 1% /home tmpfs 13G 32K 13G 1% /run/user/1000 tmpfs 13G 36K 13G 1% /run/user/0

` df -h

` total used free shared buff/cache available Mem: 125G 2,3G 46G 637M 76G 121G Swap: 8,0G 5,7M 8,0G

`

Also, the server has 56 cores (nproc output is 56)

leepc12 commented 6 years ago

Please reduce -nth to 3~5 and try again. Make sure that you have enough space (>50G) on your working directory and /tmp (or $TMP, $TMPDIR).

On Wed, Feb 7, 2018 at 11:56 PM, lpalomerol notifications@github.com wrote:

Thanks to you, you have here the logs: df -h ` S.ficheros Tamaño Usados Disp Uso% Montado en /dev/sda4 437G 4,6G 433G 2% / devtmpfs 63G 0 63G 0% /dev tmpfs 63G 0 63G 0% /dev/shm tmpfs 63G 642M 63G 1% /run tmpfs 63G 0 63G 0% /sys/fs/cgroup /dev/sda2 2,0G 218M 1,8G 11% /boot /dev/sda1 200M 9,8M 191M 5% /boot/efi /dev/sdb1 11T 74G 11T 1% /home tmpfs 13G 32K 13G 1% /run/user/1000 tmpfs 13G 36K 13G 1% /run/user/0

df -h total used free shared buff/cache available Mem: 125G 2,3G 46G 637M 76G 121G Swap: 8,0G 5,7M 8,0G

`

Also, the server has 56 cores (nproc output is 56)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kundajelab/atac_dnase_pipelines/issues/92#issuecomment-364030734, or mute the thread https://github.com/notifications/unsubscribe-auth/AIOd_N_90HXVFjMSswR90cfVC8DigxUuks5tSqjHgaJpZM4R86e1 .

lpalomerol commented 6 years ago

Hello Jin, and sorry for the delay, and thank you again for your support.

I've tried to launch it again with three cores and code has been stucked again at same point.

Then, I've launched the the bowtie2 with samtools again manually all together and step by step (with temporary files). First time has failed (with an error trace lcoated at a log file) and second one (manually) has worked.

This is first trace, which seems a lack of memory error launched by the pipeline:

Error, fewer reads in file specified with -2 than in file specified with -1 terminate called after throwing an instance of 'int' (ERR): bowtie2-align died with signal 6 (ABRT) (core dumped)

Second trial has generated those 3 files. First file has bowtie2 output, second one by samtools view and last by samtools sort.

` -rw-rw-r--. 1 idibell idibell 26G feb 15 15:00 alignment.btwie -rw-rw-r--. 1 idibell idibell 21G feb 15 15:13 alignment.btwie.sam -rw-rw-r--. 1 idibell idibell 5,0G feb 15 16:56 alignment.btwie.sam.sort.bam

`

Do you think that could be good idea change the code pipeline and replace the "pipe" concatenated instructions with 3 different ones?

Cheers, Luis.

leepc12 commented 6 years ago

Does your cluster force memory limit on jobs submitted? Can you try with -mem_bwt2 30G? Is there any particular reason for running pipelines on your $HOME? $HOME is usually for storing code (not for big data). What is your cluster's policy on this?

Also, does your paired end sample have correctly paired FASTQs (R1 and R2) for each replicate?

Thanks,

Jin

On Thu, Feb 15, 2018 at 8:24 AM, lpalomerol notifications@github.com wrote:

Hello Jin, and sorry for the delay, and thank you again for your support.

I've tried to launch it again with three cores and code has been stucked again at same point.

Then, I've launched the the bowtie2 with samtools again manually all together and step by step (with temporary files). First time has failed (with an error trace lcoated at a log file) and second one (manually) has worked.

This is first trace, which seems a lack of memory error launched by the pipeline https://www.biostars.org/p/251496/:

Error, fewer reads in file specified with -2 than in file specified with -1 terminate called after throwing an instance of 'int' (ERR): bowtie2-align died with signal 6 (ABRT) (core dumped)

Second trial has generated those 3 files

` -rw-rw-r--. 1 idibell idibell 26G feb 15 15:00 alignment.btwie -rw-rw-r--. 1 idibell idibell 21G feb 15 15:13 alignment.btwie.sam -rw-rw-r--. 1 idibell idibell 5,0G feb 15 16:56 alignment.btwie.sam.sort.bam

`

Do you think that could be good idea change the code pipeline and replace the "pipe" concatenated instructions with 3 different ones?

Cheers, Luis.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kundajelab/atac_dnase_pipelines/issues/92#issuecomment-365980578, or mute the thread https://github.com/notifications/unsubscribe-auth/AIOd_F9zbM0Tkcp38jO4V_7-3-zzfcu8ks5tVFpTgaJpZM4R86e1 .

lpalomerol commented 6 years ago

Helo again Jin,

currently we do not have an fixed code policy at the cluster. I appreciate the observation.

About the error, I've copied and pasted the "problematic" code onto a custom bash file and I've executed it again. What I've seen is the same error "Error, fewer reads in file specified with -2 than in file specified with -1"

First file has 241,957,476 and Second one 240,175,504 lines. In this case, AFAIK, i cannot align the files directly, true? (now I am running trimmomatic prior to launch again the code). So in this case, the code seems okay, and the problem is in the data (and between computer and chair :sweat_smile:

Thank you very much again. Luis

lpalomerol commented 6 years ago

After launching with trimmed fastq files, the pipeline has worked.

Closing issue.