Closed rehamFatima closed 6 years ago
It looks like that you didn't install Conda. Please follow instructions on README. First, install Conda and install dependencies (with install_dependencies.sh
).
thanks for the reply. But I have conda installed as when I run $ conda
it displays the conda help. I have also run install_dependencies.sh
. Both bds_atac and bds_atac_py3 are created already.
Possible conflict between pyenv and Conda? I actually don't know well about pyenv. Can you remove pyenv from your ~/.bashrc and try again?
Thanks Jin. Pyenv is a python version management environment.
I have commented it out from my ~/.bash_profile (that is where it was). But I am still getting the same output
Did you re-login after removing that from ~/.bashrc
? Please try $ which pyenv
to check if it's removed indeed.
After my first two fastq read pairs ran successfully, this third read pair gave me a very similar issue to the one mentioned above (from what I can tell):
Fatal error: /home/ahorning/software/atac_dnase_pipelines/atac.bds, line 789, pos 3. Task/s failed.
Creating checkpoint file: Config or command line option disabled checkpoint file creation, nothing done.
Fatal error: /home/ahorning/software/atac_dnase_pipelines/atac.bds, line 426, pos 2. Task/s failed.
Creating checkpoint file: Config or command line option disabled checkpoint file creation, nothing done.
Should I also try reinstalling conda as well or is this a different issue? I dont think its a memory issue because:
Filesystem Size Used Avail Use% Mounted on
gsfs0 4.4P 3.0P 1.5P 67% /srv/gsfs0
but i've been known to be wrong.
here are the log files too NBM_40-1stderror.txt NBM_40-1stdout.txt
@ahorn720 please try to add -mem_bwt2 30G
to the command line and see if it works.
@rehamFatima sorry about that, but this is obviously a conflict between pyenv
and conda
. Did you check $ which pyenv
?
Thanks Jin. Yes, I removed the pyenv from the path. it has gone passed that, but still generating the error
Distributing 8 to ...
{1=8}
Specified adapter for rep1:00 (SE) : None
Task has finished (6 seconds).
Task failed:
Program & line : 'modules/align_bowtie2.bds', line 65
Task Name : 'bowtie2 rep1'
Task ID : 'atac.bds.20180702_120934_437_parallel_28/task.align_bowtie2.bowtie2_rep1.line_65.id_11'
Task PID : 'null'
Task hint : 'bowtie2 --local -x /nfs/nobackup/ensembl/reham_ens/genome/mm10/bowtie2_index/mm10_no_alt_analysis_set_ENCODE.fasta --threads 8 -U <(zcat -f /homes/reham/ATAC-Seq/atac_dnase_pipelines/../genomes/ENCFF124LBK.fastq.gz) 2> /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/qc/rep1/ENCFF124LBK.align.log |'
Task resources : 'cpus: 8 mem: -1.0 B wall-timeout: 8640000'
State : 'START_FAILED'
Dependency state : 'ERROR'
Retries available : '1'
Input files : '[/homes/reham/ATAC-Seq/atac_dnase_pipelines/../genomes/ENCFF124LBK.fastq.gz]'
Output files : '[/homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/ENCFF124LBK.bam, /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/qc/rep1/ENCFF124LBK.align.log]'
Script file : '/homes/reham/ATAC-Seq/atac_dnase_pipelines/atac.bds.20180702_120934_437_parallel_28/task.align_bowtie2.bowtie2_rep1.line_65.id_11.sh'
Error message : 'Not enough resources to execute task: cpus: 8 mem: -1.0 B wall-timeout: 8640000'
Exit status : '1'
Program :
# SYS command. line 67
if [[ -f $(which conda) && $(conda env list | grep bds_atac | wc -l) != "0" ]]; then source activate bds_atac; sleep 5; fi; export PATH=/homes/reham/ATAC-Seq/atac_dnase_pipelines/.:/homes/reham/ATAC-Seq/atac_dnase_pipelines/modules:/homes/reham/ATAC-Seq/atac_dnase_pipelines/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)
# SYS command. line 71
bowtie2 --local -x /nfs/nobackup/ensembl/reham_ens/genome/mm10/bowtie2_index/mm10_no_alt_analysis_set_ENCODE.fasta --threads 8 -U <(zcat -f /homes/reham/ATAC-Seq/atac_dnase_pipelines/../genomes/ENCFF124LBK.fastq.gz) 2> /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/qc/rep1/ENCFF124LBK.align.log | \
samtools view -Su /dev/stdin | samtools sort - /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/ENCFF124LBK
# SYS command. line 73
cat /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/qc/rep1/ENCFF124LBK.align.log
# SYS command. line 74
samtools index /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/ENCFF124LBK.bam
# SYS command. line 76
TASKTIME=$[$(date +%s)-${STARTTIME}]; echo "Task has finished (${TASKTIME} seconds)."; sleep 0
Fatal error: atac.bds, line 512, pos 3. Task/s failed.
Creating checkpoint file: Config or command line option disabled checkpoint file creation, nothing done.
Fatal error: atac.bds, line 426, pos 2. Task/s failed.
Creating checkpoint file: Config or command line option disabled checkpoint file creation, nothing done.
Did you run it on your personal computer? Please check if you have enough memory (>30G) to run this pipeline.
$ free -h
no, I am running in on the EBI cluster. The above produces the following :
total used free shared buff/cache available
Mem: 250G 52G 853M 49M 197G 193G
Swap: 4.0G 394M 3.6G
Did you run it on a login node? If so, your job can get killed by the cluster. How does EBI cluster work for submitting/monitoring jobs?
No, I ran it with LSF; bsub.ed in with -M and -R at 4000M
It looks like your cluster killed high memory jobs (like bowtie2). Our pipeline does not support automatic task submission/monitoring for LSF.
Please get on an interactive node with enough memory, walltime and cpu and then run a pipeline with -system local -no_par
.
sorry, that was 40000MB (so thats 40G). I have gone in with an interactive node.
Shall I include those arguments in the command that I am running ?
Yes, include -system local -no_par
in your command. not the -M -R
Thanks Jin, but I have gone over the possible list of arguments for the bsub command, there is no -system
argument applicable. It also gives an error :
ystem: Illegal signal value. Job not submitted.
when I try to run :
bsub -system local -no_par -Is bash
No, I actually don't know about LSF but the command looks similar to SGE. So the basic idea is that 1) you first get on an interactive node with bsub
(there should be some parameter for an interactive node) and then on that node run 2) pipeline command bds tac.bds ... -system local -no_par
.
Running :
bds atac.bds -species mm10 -gensz mm -bam1 /nfs/nobackup/ensembl/reham_ens/genome/mm10/mm10_enh_dhs.bam -system local -no_par
gives :
Distributing 8 to ...
{1=8}
Task failed:
Program & line : 'modules/postalign_bam.bds', line 278
Task Name : 'dedup_bam_PE_1 rep1'
Task ID : 'atac.bds.20180702_164535_557/task.postalign_bam.dedup_bam_PE_1_rep1.line_278.id_10'
Task PID : 'null'
Task hint : 'if [[ 0 > 0 ]]; then; samtools view -F 524 -f 2 -u /nfs/nobackup/ensembl/reham_ens/genome/mm10/mm10_enh_dhs.bam |; sambamba sort -t 8 -n /dev/stdin -o /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.dupmark.bam;; samtools view -h /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/'
Task resources : 'cpus: 8 mem: -1.0 B wall-timeout: 8640000'
State : 'START_FAILED'
Dependency state : 'ERROR'
Retries available : '1'
Input files : '[/nfs/nobackup/ensembl/reham_ens/genome/mm10/mm10_enh_dhs.bam]'
Output files : '[/homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.filt.bam]'
Script file : '/homes/reham/ATAC-Seq/atac_dnase_pipelines/atac.bds.20180702_164535_557/task.postalign_bam.dedup_bam_PE_1_rep1.line_278.id_10.sh'
Error message : 'Not enough resources to execute task: cpus: 8 mem: -1.0 B wall-timeout: 8640000'
Exit status : '1'
Program :
# SYS command. line 280
if [[ -f $(which conda) && $(conda env list | grep bds_atac | wc -l) != "0" ]]; then source activate bds_atac; sleep 5; fi; export PATH=/homes/reham/ATAC-Seq/atac_dnase_pipelines/.:/homes/reham/ATAC-Seq/atac_dnase_pipelines/modules:/homes/reham/ATAC-Seq/atac_dnase_pipelines/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)
# SYS command. line 296
if [[ 0 > 0 ]]; then \
samtools view -F 524 -f 2 -u /nfs/nobackup/ensembl/reham_ens/genome/mm10/mm10_enh_dhs.bam | \
sambamba sort -t 8 -n /dev/stdin -o /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.dupmark.bam; \
samtools view -h /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.dupmark.bam | \
$(which assign_multimappers.py) -k 0 --paired-end | \
samtools fixmate -r /dev/stdin /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.dupmark.bam.fixmate.bam; \
else \
samtools view -F 1804 -f 2 -q 30 -u /nfs/nobackup/ensembl/reham_ens/genome/mm10/mm10_enh_dhs.bam | \
sambamba sort -t 8 -n /dev/stdin -o /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.dupmark.bam; \
samtools fixmate -r /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.dupmark.bam /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.dupmark.bam.fixmate.bam; \
fi
# SYS command. line 308
samtools view -F 1804 -f 2 -u /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.dupmark.bam.fixmate.bam | sambamba sort -t 8 /dev/stdin -o /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.filt.bam
# SYS command. line 310
rm -f /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.dupmark.bam.fixmate.bam
# SYS command. line 311
rm -f /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.dupmark.bam
# SYS command. line 313
TASKTIME=$[$(date +%s)-${STARTTIME}]; echo "Task has finished (${TASKTIME} seconds)."; sleep 0
Fatal error: modules/postalign_bam.bds, line 318, pos 3. Task/s failed.
Creating checkpoint file: Config or command line option disabled checkpoint file creation, nothing done.
The error message says: Not enough resources to execute task: cpus: 8 mem: -1.0 B wall-timeout: 8640000
.
So you need to reduce number of cpus. Please try to add -nth 4
(using 4 cpus instead of 8) to the command line. If that doesn't work, then -nth 2
or -nth 1
Running
bds atac.bds -species mm10 -gensz mm -nth 1 -bam1 /nfs/nobackup/ensembl/reham_ens/genome/mm10/mm10_enh_dhs.bam -system local -no_par
gives :
== checking input files ...
Rep1 bam (PE) :
/nfs/nobackup/ensembl/reham_ens/genome/mm10/mm10_enh_dhs.bam
Distributing 1 to ...
{1=1}
awk: cmd. line:1: fatal: division by zero attempted
Task failed:
Program & line : 'modules/postalign_bam.bds', line 341
Task Name : 'dedup_bam_PE_2 rep1'
Task ID : 'atac.bds.20180704_170300_909/task.postalign_bam.dedup_bam_PE_2_rep1.line_341.id_10'
Task PID : '135358'
Task hint : 'samtools view -F 1804 -f 2 -b /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.dupmark.bam > /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.nodup.bam; sambamba index -t 1 /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.nodup.bam; s'
Task resources : 'cpus: -1 mem: -1.0 B wall-timeout: 8640000'
State : 'ERROR'
Dependency state : 'ERROR'
Retries available : '1'
Input files : '[/homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.dupmark.bam]'
Output files : '[/homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.nodup.bam, /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/qc/rep1/mm10_enh_dhs.nodup.flagstat.qc, /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/qc/rep1/mm10_enh_dhs.nodup.pbc.qc]'
Script file : '/homes/reham/ATAC-Seq/atac_dnase_pipelines/atac.bds.20180704_170300_909/task.postalign_bam.dedup_bam_PE_2_rep1.line_341.id_10.sh'
Exit status : '1'
Program :
# SYS command. line 343
if [[ -f $(which conda) && $(conda env list | grep bds_atac | wc -l) != "0" ]]; then source activate bds_atac; sleep 5; fi; export PATH=/homes/reham/ATAC-Seq/atac_dnase_pipelines/.:/homes/reham/ATAC-Seq/atac_dnase_pipelines/modules:/homes/reham/ATAC-Seq/atac_dnase_pipelines/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)
# SYS command. line 350
samtools view -F 1804 -f 2 -b /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.dupmark.bam > /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.nodup.bam
# SYS command. line 352
sambamba index -t 1 /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.nodup.bam
# SYS command. line 354
sambamba flagstat -t 1 /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.nodup.bam > /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/qc/rep1/mm10_enh_dhs.nodup.flagstat.qc
# SYS command. line 365
sambamba sort -t 1 -n /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.dupmark.bam -o /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.dupmark.bam.tmp.bam
# SYS command. line 367
bedtools bamtobed -bedpe -i /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.dupmark.bam.tmp.bam | \
awk 'BEGIN{OFS="\t"}{print $1,$2,$4,$6,$9,$10}' | \
grep -v 'chrM' | sort | uniq -c | \
awk 'BEGIN{mt=0;m0=0;m1=0;m2=0} ($1==1){m1=m1+1} ($1==2){m2=m2+1} {m0=m0+1} {mt=mt+$1} END{m1_m2=-1.0; if(m2>0) m1_m2=m1/m2; printf "%d\t%d\t%d\t%d\t%f\t%f\t%f\n",mt,m0,m1,m2,m0/mt,m1/m0,m1_m2}' > /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/qc/rep1/mm10_enh_dhs.nodup.pbc.qc
# SYS command. line 371
rm -f /homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.dupmark.bam.tmp.bam
# SYS command. line 373
TASKTIME=$[$(date +%s)-${STARTTIME}]; echo "Task has finished (${TASKTIME} seconds)."; sleep 0
StdErr (100000000 lines) :
awk: cmd. line:1: fatal: division by zero attempted
Fatal error: modules/postalign_bam.bds, line 378, pos 4. Task/s failed.
Creating checkpoint file: Config or command line option disabled checkpoint file creation, nothing done.
It is actually a bit weirdly confusing, as this same command was giving a little different output a day ago. which got erased when I logged out but contained something like :
Distributing 1 to ...
{1=1}
Task finished
and then some trailing sentences but then ended at pretty much the same Creating checkpoint file error. I have not been able to reproduce the same output using this command though.
Closing this issue because it's duplicate https://github.com/kundajelab/atac_dnase_pipelines/issues/128
@leepc12 the issue has not been resolved. Why would you close it ?
@rehamFatima sorry duplicate issue was issue #126 not this (#128)
Hi,
I have been trying to run your pipeline with the mm10 data installed from install_genome_data. My output looks like something in the file "myOutStruct" compared to the structure given on the ATAC-seq gitHub page (screenshot in "originalOutStruct")
I am running the command :
../genomes/mm10_enh_dhs.bam -chrsz ../genomes/mm10/mm10.chrom.sizes
I have generated the bam file using the bed files within the mm10 download/install.
It also generates the error file ("task.postalign_bam.markdup_bam_picard_rep1.line_409.id_10.stderr"). Which looks something like :
Picked up _JAVA_OPTIONS: -Xms256M -Xmx12G -XX:ParallelGCThreads=1 [Wed Jun 27 17:07:04 BST 2018] picard.sam.markduplicates.MarkDuplicates INPUT=[/homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.filt.bam] OUTPUT=/homes/reham/ATAC-Seq/atac_dnase_pipelines/out/align/rep1/mm10_enh_dhs.dupmark.bam METRICS_FILE=/homes/reham/ATAC-Seq/atac_dnase_pipelines/out/qc/rep1/mm10_enh_dhs.dup.qc REMOVE_DUPLICATES=false ASSUME_SORTED=true VALIDATION_STRINGENCY=LENIENT MAX_SEQUENCES_FOR_DISK_READ_ENDS_MAP=50000 MAX_FILE_HANDLES_FOR_READ_ENDS_MAP=8000 SORTING_COLLECTION_SIZE_RATIO=0.25 PROGRAM_RECORD_ID=MarkDuplicates PROGRAM_GROUP_NAME=MarkDuplicates DUPLICATE_SCORING_STRATEGY=SUM_OF_BASE_QUALITIES READ_NAME_REGEX=[a-zA-Z0-9]+:[0-9]:([0-9]+):([0-9]+):([0-9]+).* OPTICAL_DUPLICATE_PIXEL_DISTANCE=100 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false [Wed Jun 27 17:07:04 BST 2018] Executing as reham@hx-noah-08-10.ebi.ac.uk on Linux 3.10.0-514.16.1.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_131-b11; Picard version: 1.126(4691ee611ac205d4afe2a1b7a2ea975a6f997426_1417447214) IntelDeflater INFO 2018-06-27 17:07:04 MarkDuplicates Start of doWork freeMemory: 256939704; totalMemory: 259588096; maxMemory: 12455444480 INFO 2018-06-27 17:07:04 MarkDuplicates Reading input file and constructing read end information. INFO 2018-06-27 17:07:04 MarkDuplicates Will retain up to 47905555 data points before spilling to disk. INFO 2018-06-27 17:07:07 MarkDuplicates Read 0 records. 0 pairs never matched. INFO 2018-06-27 17:07:07 MarkDuplicates After buildSortedReadEndLists freeMemory: 539996048; totalMemory: 929882112; maxMemory: 12455444480 INFO 2018-06-27 17:07:07 MarkDuplicates Will retain up to 389232640 duplicate indices before spilling to disk.
The output I am presently getting concludes with the following :
Could you please help me in generating the right output ?
Many thanks. Kind Regards, Reham