Closed easwaranramamurthy closed 6 years ago
First of all, -type atac-seq
must be replaced with -type atac
. Please post a full log. I need more info for debugging.
BTW, did you install conda and dependencies correctly?
Thank you. The confusion arose because the documentation says that it should be atac-seq instead of atac.
$ bds atac.bds
== atac pipeline settings
-type <string> : Type of the pipeline. atac-seq or dnase-seq (default: atac-seq).
Yes, conda and dependencies are installed correctly. Here is the full standard error:
Traceback (most recent call last):
File "/path/to/atac/atac_dnase_pipelines/utils/detect_adapter.py", line 70, in <module>
main()
File "/path/to/atac/atac_dnase_pipelines/utils/detect_adapter.py", line 66, in main
best_adapter = detect_most_likely_adapter(sys.argv[1])
File "/path/to/atac/atac_dnase_pipelines/utils/detect_adapter.py", line 40, in detect_most_likely_adapter
observed_adapters, adapter_cnts, n_obs_adapters = detect_adapters_and_cnts(fname)
File "/path/to/atac/atac_dnase_pipelines/utils/detect_adapter.py", line 25, in detect_adapters_and_cnts
for seq_index, line in enumerate(fp):
File "/path/to/miniconda3/envs/bds_atac_py3/lib/python3.5/gzip.py", line 372, in readline
return self._buffer.readline(size)
File "/path/to/miniconda3/envs/bds_atac_py3/lib/python3.5/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "/path/to/miniconda3/envs/bds_atac_py3/lib/python3.5/gzip.py", line 461, in read
if not self._read_gzip_header():
File "/path/to/miniconda3/envs/bds_atac_py3/lib/python3.5/gzip.py", line 409, in _read_gzip_header
raise OSError('Not a gzipped file (%r)' % magic)
OSError: Not a gzipped file (b'@S')
Traceback (most recent call last):
File "/path/to/atac/atac_dnase_pipelines/utils/detect_adapter.py", line 70, in <module>
main()
File "/path/to/atac/atac_dnase_pipelines/utils/detect_adapter.py", line 66, in main
best_adapter = detect_most_likely_adapter(sys.argv[1])
File "/path/to/atac/atac_dnase_pipelines/utils/detect_adapter.py", line 40, in detect_most_likely_adapter
observed_adapters, adapter_cnts, n_obs_adapters = detect_adapters_and_cnts(fname)
File "/path/to/atac/atac_dnase_pipelines/utils/detect_adapter.py", line 25, in detect_adapters_and_cnts
for seq_index, line in enumerate(fp):
File "/path/to/miniconda3/envs/bds_atac_py3/lib/python3.5/gzip.py", line 372, in readline
return self._buffer.readline(size)
File "/path/to/miniconda3/envs/bds_atac_py3/lib/python3.5/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "/path/to/miniconda3/envs/bds_atac_py3/lib/python3.5/gzip.py", line 461, in read
if not self._read_gzip_header():
File "/path/to/miniconda3/envs/bds_atac_py3/lib/python3.5/gzip.py", line 409, in _read_gzip_header
raise OSError('Not a gzipped file (%r)' % magic)
OSError: Not a gzipped file (b'@S')
Task failed:
Program & line : '/path/to/atac/atac_dnase_pipelines/modules/align_trim_adapter.bds', line 48
Task Name : 'detect_adapter rep1'
Task ID : 'atac.bds.20180417_154414_913/task.align_trim_adapter.detect_adapter_rep1.line_48.id_10'
Task PID : '112435'
Task hint : 'python3 $(which detect_adapter.py) file_1.fastq > processed_atac/qc/rep1/file_1.adapter.txt; TASKTIME=$[$(date +%s)-${STARTTIME}]; echo "Task has finished (${TASKTIME} seconds)."; sleep 0'
Task resources : 'cpus: -1 mem: -1.0 B wall-timeout: 8640000'
State : 'ERROR'
Dependency state : 'ERROR'
Retries available : '1'
Input files : '[file_1.fastq]'
Output files : '[processed_atac/qc/rep1/file_1.adapter.txt]'
Script file : '/path/to/atac.bds.20180417_154414_913/task.align_trim_adapter.detect_adapter_rep1.line_48.id_10.sh'
Exit status : '1'
Program :
# SYS command. line 50
if [[ -f $(which conda) && $(conda env list | grep bds_atac_py3 | wc -l) != "0" ]]; then source activate bds_atac_py3; sleep 5; fi; export PATH=/path/to/atac/atac_dnase_pipelines/.:/path/to/atac/atac_dnase_pipelines/modules:/path/to/atac/atac_dnase_pipelines/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)
# SYS command. line 52
python3 $(which detect_adapter.py) file_1.fastq > processed_atac/qc/rep1/file_1.adapter.txt
# SYS command. line 54
TASKTIME=$[$(date +%s)-${STARTTIME}]; echo "Task has finished (${TASKTIME} seconds)."; sleep 0
StdErr (100000000 lines) :
Traceback (most recent call last):
File "/path/to/atac/atac_dnase_pipelines/utils/detect_adapter.py", line 70, in <module>
main()
File "/path/to/atac/atac_dnase_pipelines/utils/detect_adapter.py", line 66, in main
best_adapter = detect_most_likely_adapter(sys.argv[1])
File "/path/to/atac/atac_dnase_pipelines/utils/detect_adapter.py", line 40, in detect_most_likely_adapter
observed_adapters, adapter_cnts, n_obs_adapters = detect_adapters_and_cnts(fname)
File "/path/to/atac/atac_dnase_pipelines/utils/detect_adapter.py", line 25, in detect_adapters_and_cnts
for seq_index, line in enumerate(fp):
File "/path/to/miniconda3/envs/bds_atac_py3/lib/python3.5/gzip.py", line 372, in readline
return self._buffer.readline(size)
File "/path/to/miniconda3/envs/bds_atac_py3/lib/python3.5/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "/path/to/miniconda3/envs/bds_atac_py3/lib/python3.5/gzip.py", line 461, in read
if not self._read_gzip_header():
File "/path/to/miniconda3/envs/bds_atac_py3/lib/python3.5/gzip.py", line 409, in _read_gzip_header
raise OSError('Not a gzipped file (%r)' % magic)
OSError: Not a gzipped file (b'@S')
Task failed:
Program & line : '/path/to/atac/atac_dnase_pipelines/modules/align_trim_adapter.bds', line 48
Task Name : 'detect_adapter rep1'
Task ID : 'atac.bds.20180417_154414_913/task.align_trim_adapter.detect_adapter_rep1.line_48.id_11'
Task PID : '112441'
Task hint : 'python3 $(which detect_adapter.py) file_2.fastq > processed_atac/qc/rep1/file_2.adapter.txt; TASKTIME=$[$(date +%s)-${STARTTIME}]; echo "Task has finished (${TASKTIME} seconds)."; sleep 0'
Task resources : 'cpus: -1 mem: -1.0 B wall-timeout: 8640000'
State : 'ERROR'
Dependency state : 'ERROR'
Retries available : '1'
Input files : '[file_2.fastq]'
Output files : '[processed_atac/qc/rep1/file_2.adapter.txt]'
Script file : '/path/to/atac.bds.20180417_154414_913/task.align_trim_adapter.detect_adapter_rep1.line_48.id_11.sh'
Exit status : '1'
Program :
# SYS command. line 50
if [[ -f $(which conda) && $(conda env list | grep bds_atac_py3 | wc -l) != "0" ]]; then source activate bds_atac_py3; sleep 5; fi; export PATH=/path/to/atac/atac_dnase_pipelines/.:/path/to/atac/atac_dnase_pipelines/modules:/path/to/atac/atac_dnase_pipelines/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)
# SYS command. line 52
python3 $(which detect_adapter.py) file_2.fastq > processed_atac/qc/rep1/file_2.adapter.txt
# SYS command. line 54
TASKTIME=$[$(date +%s)-${STARTTIME}]; echo "Task has finished (${TASKTIME} seconds)."; sleep 0
StdErr (100000000 lines) :
Traceback (most recent call last):
File "/path/to/atac/atac_dnase_pipelines/utils/detect_adapter.py", line 70, in <module>
main()
File "/path/to/atac/atac_dnase_pipelines/utils/detect_adapter.py", line 66, in main
best_adapter = detect_most_likely_adapter(sys.argv[1])
File "/path/to/atac/atac_dnase_pipelines/utils/detect_adapter.py", line 40, in detect_most_likely_adapter
observed_adapters, adapter_cnts, n_obs_adapters = detect_adapters_and_cnts(fname)
File "/path/to/atac/atac_dnase_pipelines/utils/detect_adapter.py", line 25, in detect_adapters_and_cnts
for seq_index, line in enumerate(fp):
File "/path/to/miniconda3/envs/bds_atac_py3/lib/python3.5/gzip.py", line 372, in readline
return self._buffer.readline(size)
File "/path/to/miniconda3/envs/bds_atac_py3/lib/python3.5/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "/path/to/miniconda3/envs/bds_atac_py3/lib/python3.5/gzip.py", line 461, in read
if not self._read_gzip_header():
File "/path/to/miniconda3/envs/bds_atac_py3/lib/python3.5/gzip.py", line 409, in _read_gzip_header
raise OSError('Not a gzipped file (%r)' % magic)
OSError: Not a gzipped file (b'@S')
Fatal error: /path/to/atac/atac_dnase_pipelines/atac.bds, line 737, pos 6. Task/s failed.
atac.bds, line 82 : main()
atac.bds, line 85 : void main() { // atac pipeline starts here
atac.bds, line 95 : do_align()
atac.bds, line 394 : void do_align() {
atac.bds, line 420 : for (int rep=1; rep<=get_num_rep(); rep++) {
atac.bds, line 422 : if ( no_par ) do_align( rep, nth_rep{rep} )
atac.bds, line 431 : void do_align( int rep, int nth_rep ) {
atac.bds, line 433 : if ( is_se( rep ) ) align_SE( rep, nth_rep )
atac.bds, line 434 : else align_PE( rep, nth_rep )
atac.bds, line 695 : void align_PE( int rep, int nth_rep ) {
atac.bds, line 708 : if ( is_input_fastq( rep ) ) {
atac.bds, line 722 : for ( string pool_id : fastqs_pair1.keys() ) {
atac.bds, line 724 : if ( !(adapters_pair1.hasKey(pool_id) && adapters_pair2.hasKey(pool_id)) \
atac.bds, line 730 : else {
atac.bds, line 732 : if ( !(adapters_pair1.hasKey(pool_id) && adapters_pair2.hasKey(pool_id)) \
atac.bds, line 733 : && auto_detect_adapter ) {
atac.bds, line 737 : wait [tid1, tid2]
Creating checkpoint file: Config or command line option disabled checkpoint file creation, nothing done.
The standard out is pasted below:
== git info
Latest git commit : cb35850349faa552054ae453077881e06a87b1be (Mon Apr 16 16:58:56 2018)
Reading parameters from section (default) in file(/path/to/atac/atac_dnase_pipelines/default.env)...
== configuration file info
Hostname : compute-1-13.local
Configuration file :
Environment file : /path/to/atac/atac_dnase_pipelines/default.env
Warning: Maximum # threads (-nth) for a pipeline is <= 1. Turning off parallelization... (-no_par)
== parallelization info
No parallel jobs : true
Maximum # threads : 1
== cluster/system info
Walltime (general) : 5h50m
Max. memory (general) : 7G
Force to use a system : local
Process priority (niceness) : 0
Retiral for failed tasks : 0
Submit tasks to a cluster queue :
Unlimited cluster mem./walltime : false
Use --acount instead of SLURM partition : false
Java temporary directory : ${TMPDIR}
== shell environment info
Conda env. : bds_atac
Conda env. for python3 : bds_atac_py3
Conda bin. directory :
Shell cmd. for init. : if [[ -f $(which conda) && $(conda env list | grep bds_atac | wc -l) != "0" ]]; then source activate bds_atac; sleep 5; fi; export PATH=/path/to/atac/atac_dnase_pipelines/.:/path/to/atac/atac_dnase_pipelines/modules:/path/to/atac/atac_dnase_pipelines/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)
Shell cmd. for init.(py3) : if [[ -f $(which conda) && $(conda env list | grep bds_atac_py3 | wc -l) != "0" ]]; then source activate bds_atac_py3; sleep 5; fi; export PATH=/path/to/atac/atac_dnase_pipelines/.:/path/to/atac/atac_dnase_pipelines/modules:/path/to/atac/atac_dnase_pipelines/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)
Shell cmd. for fin. : TASKTIME=$[$(date +%s)-${STARTTIME}]; echo "Task has finished (${TASKTIME} seconds)."; sleep 0
Cluster task min. len. : 60
Cluster task delay : 0
== output directory/title info
Output dir. : processed_atac
Title (prefix) : atac_process
Reading parameters from section (default) in file(/path/to/atac/atac_dnase_pipelines/default.env)...
Reading parameters from section (hg19) in file(/path/to/genomes/bds_atac_species.conf)...
== species settings
Species : hg19
Species file : /path/to/genomes/bds_atac_species.conf
Species name (WashU browser) : hg19
Ref. genome seq. fasta : /path/to/genomes/hg19/male.hg19.fa
Chr. sizes file : /path/to/genomes/hg19/hg19.chrom.sizes
Black list bed : /path/to/genomes/hg19/wgEncodeDacMapabilityConsensusExcludable.bed.gz
Ref. genome seq. dir. :
== ENCODE accession settings
ENCODE experiment accession :
ENCODE award RFA :
ENCODE assay category :
ENCODE assay title :
ENCODE award :
ENCODE lab :
ENCODE assembly genome :
ENCODE alias prefix : KLAB_PIPELINE
ENCODE alias suffix :
== report settings
URL root for output directory :
Genome coord. for browser tracks :
== align multimapping settings
# alignments reported for multimapping : 0
== align bowtie2 settings
Bowtie2 index : /path/to/genomes/hg19/bowtie2_index/male.hg19.fa
Replacement --score-min for bowtie2 :
Walltime (bowtie2) : 47h
Max. memory (bowtie2) : 12G
Extra param. (bowtie2) :
Disable index on memory (bowtie2) : false
== adapter trimmer settings
Maximum allowed error rate for cutadapt : 0.10
Minimum trim. length for cutadapt -m : 5
Walltime (adapter trimming) : 23h
Max. memory (adapter trimming) : 12G
== postalign bam settings
MAPQ reads rm thresh. : 30
Rm. tag reads with str. : chrM
No dupe removal in filtering raw bam : false
Walltime (bam filter) : 23h
Max. memory (bam filter) : 12G
Dup marker : picard
Use sambamba markdup (instead of picard) : false
== postalign bed/tagalign settings
Max. memory for UNIX shuf : 12G
No --random-source for UNIX shuf : false
== postalign cross-corr. analysis settings
Max. memory for UNIX shuf : 12G
User-defined cross-corr. peak strandshift : -1
Extra parameters for cross-corr. analysis :
Max. memory for cross-corr. analysis : 15G
== callpeak macs2 settings
Genome size (hs,mm) : hs
Walltime (macs2) : 23h
Max. memory (macs2) : 15G
Cap number of peaks (macs2) : 300K
Extra parameters for macs2 callpeak :
== callpeak naiver overlap settings
Bedtools intersect -nonamecheck : false
== IDR settings
Append IDR threshold to IDR out_dir : false
== ATAQC settings
TSS enrichment bed : /path/to/genomes/hg19/ataqc/hg19_gencode_tss_unique.bed.gz
DNase bed for ataqc : /path/to/genomes/hg19/ataqc/reg2map_honeybadger2_dnase_all_p10_ucsc.bed.gz
Promoter bed for ataqc : /path/to/genomes/hg19/ataqc/reg2map_honeybadger2_dnase_prom_p2.bed.gz
Enhancer bed for ataqc : /path/to/genomes/hg19/ataqc/reg2map_honeybadger2_dnase_enh_p2.bed.gz
Reg2map for ataqc : /path/to/genomes/hg19/ataqc/dnase_avgs_reg2map_p10_merged_named.pvals.gz
Reg2map_bed for ataqc : /path/to/genomes/hg19/ataqc/reg2map_honeybadger2_dnase_all_p10_ucsc.bed.gz
Roadmap metadata for ataqc : /path/to/genomes/hg19/ataqc/eid_to_mnemonic.txt
Max. memory for ATAQC : 20G
Walltime for ATAQC : 47h
== atac pipeline settings
Type of pipeline : atac-seq
Align only : false
# reads to subsample replicates (0 if no subsampling) : 0
# reads to subsample for cross-corr. analysis : 25000000
No pseudo replicates : false
No ATAQC (advanced QC report) : false
No Cross-corr. analysis : false
Use CSEM for alignment : false
Smoothing window for MACS2 : 150
DNase Seq : false
IDR threshold : 0.1
Force to use ENCODE3 parameter set : false
Force to use ENCODE parameter set : false
Disable genome browser tracks : false
p-val thresh. for overlapped peaks : 0.01
MACS2 p-val thresh. for peaks : 0.01
MACS2 p-val thresh. for BIGWIGs : 0.01
Enable IDR on called peaks : false
Automatically find/trim adapters : true
== checking atac parameters ...
Checking parameters and data files for ATAQC.
== checking adapters to be trimmed ...
Rep1 R1 adapters (PE) :
00: automatically detected and trimmed.
Rep1 R2 adapters (PE) :
00: automatically detected and trimmed.
== checking input files ...
Rep1 fastq (PE) :
file_1.fastq
file_2.fastq
Distributing 1 to ...
{1=1}
Oops, sorry about that. Pipeline actually works with both atac
or atac-seq
(also for dnase
and dnase-seq
). So README is correct.
Fastqs must be gzipped first. Please gzip it and try again.
Thanks a lot! Gzipping the fastqs worked.
Hi,
When I run the following command:
bds atac.bds -nth 1 -type atac-seq -species hg19 -title atac_process -auto_detect_adapter -fastq1_1 file_1.fastq -fastq1_2 file_2.fastq -out_dir processed_atac
I get the following error:
Any help with debugging this would be appreciated