kundajelab / atac_dnase_pipelines

ATAC-seq and DNase-seq processing pipeline
BSD 3-Clause "New" or "Revised" License
160 stars 81 forks source link

Error: hg19 bowtie2 index #31

Closed rleylek closed 7 years ago

rleylek commented 7 years ago

Is there a way to tell it to just pick one? Thanks!

Error (modules/align_bowtie2.bds, line 36, pos 3): Bowtie2 index (-bwt2_idx) doesn't exists! (file: /share/PI/akundaje/data/hg19/bwt2_idx/ENCODEHg19_male.1.bt2 or /share/PI/akundaje/data/hg19/bwt2_idx/ENCODEHg19_male.1.bt2l)

leepc12 commented 7 years ago

Can you post your command line input and the full error log or HTML report? I just tested the pipeline on Sherlock but got no error.

rleylek commented 7 years ago

Hi, thanks for the quick reply!

I tried:

bds atac.bds -species hg19 -fastq1_1 $SCRATCH/ATAC_Jan/BDCA1-1_S10_R1_001.fastq.gz -fastq1_2 $SCRATCH/ATAC_Jan/BDCA1-1_S10_R2_001.fastq.gz

and then tried to specify bwt2_idx:

bds atac.bds -species hg19 —fastq1_1 $SCRATCH/ATAC_Jan/BDCA1-1_S10_R1_001.fastq.gz -fastq1_2 $SCRATCH/ATAC_Jan/BDCA1-1_S10_R2_001.fastq.gz -bwt2_idx /share/PI/akundaje/data/hg19/bwt2_idx/ENCODEHg19_male

Here's the full transcript: == git info Latest git commit : 8a3b319340f2e0ff23f69e8fe7df7c56f75828c1 (Mon Jan 23 10:50:55 2017) Reading parameters from section (sherlock*.stanford.edu) in file(/home/rleylek/atac_dnase_pipelines/default.env)...

== configuration file info Hostname : sherlock-ln02.stanford.edu Configuration file : Environment file : /home/rleylek/atac_dnase_pipelines/default.env

== parallelization info No parallel jobs : false Maximum # threads : 4

== cluster/system info Walltime (general) : 5h50m Max. memory (general) : 7G Force to use a system : slurm Process priority (niceness) : 0 Retiral for failed tasks : 0 Submit tasks to a cluster queue : Unlimited cluster mem./walltime : false

== shell environment info Conda env. : bds_atac Conda env. for python3 : bds_atac_py3

Shell cmd. for init. : if [[ -f $(which conda) && $(conda env list | grep bds_atac | wc -l) != "0" ]]; then source activate bds_atac; sleep 5; fi; export PATH=/home/rleylek/atac_dnase_pipelines/.:/home/rleylek/atac_dnase_pipelines/modules:/home/rleylek/atac_dnase_pipelines/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)

Shell cmd. for init.(py3) : if [[ -f $(which conda) && $(conda env list | grep bds_atac_py3 | wc -l) != "0" ]]; then source activate bds_atac_py3; sleep 5; fi; export PATH=/home/rleylek/atac_dnase_pipelines/.:/home/rleylek/atac_dnase_pipelines/modules:/home/rleylek/atac_dnase_pipelines/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)

Shell cmd. for fin. : TASKTIME=$[$(date +%s)-${STARTTIME}]; if [ ${TASKTIME} -lt 60 ]; then echo "Waiting for $[60-${TASKTIME}] seconds."; sleep $[60-${TASKTIME}]; fi

== output directory/title info Output dir. : /home/rleylek/atac_dnase_pipelines/out Title (prefix) : atac_dnase_pipelines Reading parameters from section (default) in file(/home/rleylek/atac_dnase_pipelines/default.env)... Reading parameters from section (hg19) in file(/home/rleylek/atac_dnase_pipelines/species/sherlock.conf)...

== species settings Species : hg19 Species file : /home/rleylek/atac_dnase_pipelines/species/sherlock.conf

Species name (WashU browser) : hg19 Ref. genome seq. fasta : /share/PI/akundaje/data/hg19/ataqc/encodeHg19Male.fa Chr. sizes file : /share/PI/akundaje/data/hg19/hg19.chrom.sizes Black list bed : /share/PI/akundaje/data/hg19/wgEncodeDacMapabilityConsensusExcludable.bed.gz

== ENCODE accession settings ENCODE experiment accession : ENCODE award RFA : ENCODE assay category : ENCODE assay title : ENCODE award : ENCODE lab : ENCODE assembly genome : ENCODE alias prefix : KLAB_PIPELINE

== report settings URL root for output directory :

== align multimapping settings

alignments reported for multimapping : 0

== align bowtie2 settings Bowtie2 index : /share/PI/akundaje/data/hg19/bwt2_idx/ENCODEHg19_male Walltime (bowtie2) : 23h Max. memory (bowtie2) : 12G

== adapter trimmer settings Maximum allowed error rate for cutadapt : 0.20 Minimum trim. length for cutadapt -m : 5 Walltime (adapter trimming) : 23h Max. memory (adapter trimming) : 12G

== postalign bam settings MAPQ reads rm thresh. : 30 Rm. tag reads with str. : Walltime (bam filter) : 23h Max. memory (bam filter) : 12G Use sambamba markdup (instead of picard) : false

== postalign bed/tagalign settings Set initial fraglen. to 0 for cross-corr. (-speak=0) : false Max. memory for UNIX shuf : 12G

== callpeak macs2 settings Genome size (hs,mm) : hs Walltime (macs2) : 23h Max. memory (macs2) : 15G

== callpeak naiver overlap settings Bedtools intersect -nonamecheck : false

== callpeak etc settings

of top peaks to pick up in peak files : 500000

== IDR settings Append IDR threshold to IDR out_dir : false

== ATAQC settings TSS enrichment bed : /share/PI/akundaje/data/hg19/ataqc/hg19_RefSeq_stranded.bed.gz DNase bed for ataqc : /share/PI/akundaje/data/hg19/ataqc/reg2map_honeybadger2_dnase_all_p10_ucsc.bed.gz Promoter bed for ataqc : /share/PI/akundaje/data/hg19/ataqc/reg2map_honeybadger2_dnase_prom_p2.bed.gz Enhancer bed for ataqc : /share/PI/akundaje/data/hg19/ataqc/reg2map_honeybadger2_dnase_enh_p2.bed.gz Reg2map for ataqc : /share/PI/akundaje/data/hg19/ataqc/dnase_avgs_reg2map_p10_merged_named.pvals.gz Reg2map_bed for ataqc : /share/PI/akundaje/data/hg19/ataqc/dnase_avgs_reg2map_p10_merged_named.pvals.gz Roadmap metadata for ataqc : /share/PI/akundaje/data/hg19/ataqc/eid_to_mnemonic.txt Max. memory for ATAQC : 15G Walltime for ATAQC : 23h

== atac pipeline settings Type of pipeline : atac-seq Fastqs are trimmed? : false Align only : false

reads to subsample replicates (0 if no subsampling) : 0

reads to subsample for cross-corr. analysis : 25000000

No pseudo replicates : false No IDR analysis on peaks : false No ATAQC (advanced QC report) : false No Cross-corr. analysis : false Use CSEM for alignment : false Smoothing window for MACS2 : 150 DNase Seq : false IDR threshold : 0.1 Use old trim adapters : false Force to use ENCODE3 parameter set : false Force to use ENCODE parameter set : false

== checking atac parameters ... 00:00:01.687 Error (modules/align_bowtie2.bds, line 36, pos 3): Bowtie2 index (-bwt2_idx) doesn't exists! (file: /share/PI/akundaje/data/hg19/bwt2_idx/ENCODEHg19_male.1.bt2 or /share/PI/akundaje/data/hg19/bwt2_idx/ENCODEHg19_male.1.bt2l)

leepc12 commented 7 years ago

Can you check if you have read permission on /share/PI/akundaje/data/hg19/bwt2_idx/*?

On Sun, Feb 5, 2017 at 3:02 PM, rleylek notifications@github.com wrote:

Hi, thanks for the quick reply!

I tried:

bds atac.bds -species hg19 -fastq1_1 $SCRATCH/ATAC_Jan/BDCA1-1_S10_R1_001.fastq.gz -fastq1_2 $SCRATCH/ATAC_Jan/BDCA1-1_S10_R2_001.fastq.gz

and then tried to specify bwt2_idx:

bds atac.bds -species hg19 —fastq1_1 $SCRATCH/ATAC_Jan/BDCA1-1_S10_R1_001.fastq.gz -fastq1_2 $SCRATCH/ATAC_Jan/BDCA1-1_S10_R2_001.fastq.gz -bwt2_idx /share/PI/akundaje/data/hg19/bwt2_idx/ENCODEHg19_male

Here's the full transcript: == git info Latest git commit : 8a3b319 https://github.com/kundajelab/atac_dnase_pipelines/commit/8a3b319340f2e0ff23f69e8fe7df7c56f75828c1 (Mon Jan 23 10:50:55 2017) Reading parameters from section (sherlock*.stanford.edu) in file(/home/rleylek/atac_dnase_pipelines/default.env)...

== configuration file info Hostname : sherlock-ln02.stanford.edu Configuration file : Environment file : /home/rleylek/atac_dnase_pipelines/default.env

== parallelization info No parallel jobs : false Maximum # threads : 4

== cluster/system info Walltime (general) : 5h50m Max. memory (general) : 7G Force to use a system : slurm Process priority (niceness) : 0 Retiral for failed tasks : 0 Submit tasks to a cluster queue : Unlimited cluster mem./walltime : false

== shell environment info Conda env. : bds_atac Conda env. for python3 : bds_atac_py3

Shell cmd. for init. : if [[ -f $(which conda) && $(conda env list | grep bds_atac | wc -l) != "0" ]]; then source activate bds_atac; sleep 5; fi; export PATH=/home/rleylek/atac_dnase_pipelines/.:/home/rleylek/ atac_dnase_pipelines/modules:/home/rleylek/atacdnase pipelines/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)

Shell cmd. for init.(py3) : if [[ -f $(which conda) && $(conda env list | grep bds_atac_py3 | wc -l) != "0" ]]; then source activate bds_atac_py3; sleep 5; fi; export PATH=/home/rleylek/atacdnase pipelines/.:/home/rleylek/atac_dnase_pipelines/modules:/ home/rleylek/atac_dnase_pipelines/utils:${PATH}:/bin:/ usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)

Shell cmd. for fin. : TASKTIME=$[$(date +%s)-${STARTTIME}]; if [ ${TASKTIME} -lt 60 ]; then echo "Waiting for $[60-${TASKTIME}] seconds."; sleep $[60-${TASKTIME}]; fi

== output directory/title info Output dir. : /home/rleylek/atac_dnase_pipelines/out Title (prefix) : atac_dnase_pipelines Reading parameters from section (default) in file(/home/rleylek/atacdnase pipelines/default.env)... Reading parameters from section (hg19) in file(/home/rleylek/atacdnase pipelines/species/sherlock.conf)...

== species settings Species : hg19 Species file : /home/rleylek/atac_dnase_pipelines/species/sherlock.conf

Species name (WashU browser) : hg19 Ref. genome seq. fasta : /share/PI/akundaje/data/hg19/ ataqc/encodeHg19Male.fa Chr. sizes file : /share/PI/akundaje/data/hg19/hg19.chrom.sizes Black list bed : /share/PI/akundaje/data/hg19/ wgEncodeDacMapabilityConsensusExcludable.bed.gz

== ENCODE accession settings ENCODE experiment accession : ENCODE award RFA : ENCODE assay category : ENCODE assay title : ENCODE award : ENCODE lab : ENCODE assembly genome : ENCODE alias prefix : KLAB_PIPELINE

== report settings URL root for output directory :

== align multimapping settings alignments reported for multimapping : 0

== align bowtie2 settings Bowtie2 index : /share/PI/akundaje/data/hg19/bwt2_idx/ENCODEHg19_male Walltime (bowtie2) : 23h Max. memory (bowtie2) : 12G

== adapter trimmer settings Maximum allowed error rate for cutadapt : 0.20 Minimum trim. length for cutadapt -m : 5 Walltime (adapter trimming) : 23h Max. memory (adapter trimming) : 12G

== postalign bam settings MAPQ reads rm thresh. : 30 Rm. tag reads with str. : Walltime (bam filter) : 23h Max. memory (bam filter) : 12G Use sambamba markdup (instead of picard) : false

== postalign bed/tagalign settings Set initial fraglen. to 0 for cross-corr. (-speak=0) : false Max. memory for UNIX shuf : 12G

== callpeak macs2 settings Genome size (hs,mm) : hs Walltime (macs2) : 23h Max. memory (macs2) : 15G

== callpeak naiver overlap settings Bedtools intersect -nonamecheck : false

== callpeak etc settings of top peaks to pick up in peak files : 500000

== IDR settings Append IDR threshold to IDR out_dir : false

== ATAQC settings TSS enrichment bed : /share/PI/akundaje/data/hg19/ ataqc/hg19_RefSeq_stranded.bed.gz DNase bed for ataqc : /share/PI/akundaje/data/hg19/ ataqc/reg2map_honeybadger2_dnase_all_p10_ucsc.bed.gz Promoter bed for ataqc : /share/PI/akundaje/data/hg19/ ataqc/reg2map_honeybadger2_dnase_prom_p2.bed.gz Enhancer bed for ataqc : /share/PI/akundaje/data/hg19/ ataqc/reg2map_honeybadger2_dnase_enh_p2.bed.gz Reg2map for ataqc : /share/PI/akundaje/data/hg19/ ataqc/dnase_avgs_reg2map_p10_merged_named.pvals.gz Reg2map_bed for ataqc : /share/PI/akundaje/data/hg19/ ataqc/dnase_avgs_reg2map_p10_merged_named.pvals.gz Roadmap metadata for ataqc : /share/PI/akundaje/data/hg19/ ataqc/eid_to_mnemonic.txt Max. memory for ATAQC : 15G Walltime for ATAQC : 23h

== atac pipeline settings Type of pipeline : atac-seq Fastqs are trimmed? : false Align only : false reads to subsample replicates (0 if no subsampling) : 0 reads to subsample for cross-corr. analysis : 25000000

No pseudo replicates : false No IDR analysis on peaks : false No ATAQC (advanced QC report) : false No Cross-corr. analysis : false Use CSEM for alignment : false Smoothing window for MACS2 : 150 DNase Seq : false IDR threshold : 0.1 Use old trim adapters : false Force to use ENCODE3 parameter set : false Force to use ENCODE parameter set : false

== checking atac parameters ... 00:00:01.687 Error (modules/align_bowtie2.bds, line 36, pos 3): Bowtie2 index (-bwt2_idx) doesn't exists! (file: /share/PI/akundaje/data/hg19/bwt2_idx/ENCODEHg19_male.1.bt2 or /share/PI/akundaje/data/hg19/bwt2_idx/ENCODEHg19_male.1.bt2l)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kundajelab/atac_dnase_pipelines/issues/31#issuecomment-277557614, or mute the thread https://github.com/notifications/unsubscribe-auth/AIOd_JmRBAAG_pd4n6v61nZQeT_GMkaPks5rZlUMgaJpZM4L3oDY .

rleylek commented 7 years ago

It looks like I don't have permission to view anything in the /share/PI/akundaje folder.

[rleylek@sherlock-ln02 login_node ~]$ cd /share/PI/akundaje/data/hg19 -bash: cd: /share/PI/akundaje/data/hg19: Permission denied

[rleylek@sherlock-ln02 login_node ~]$ cd /share/PI/akundaje -bash: cd: /share/PI/akundaje: Permission denied

I'm new to Sherlock, so not sure if I did that right - is there an easy way to get around this?

leepc12 commented 7 years ago

I checked that /share/PI/akundaje is not accessible to non-kundaje-lab members. There is a workaround for this.

1) Install genome data with ./install_genome_data hg19 [DESTINATION_DIR]. If you are also interested in other genomes. Keep using the same DESTINATION_DIR. 2) Open ./default.env and modify the line with species_file in the section [sherlock*.stanford.edu, sh-*.local] species_file = [SPECIES FILE CONF ON THE DESTINATION DIR]

rleylek commented 7 years ago

OK, this makes sense - I thought I did the step to install genome data earlier, but looks like it didn't complete properly.

While "Building bowtie2 index" it keeps getting killed at the ahead-of-time memory usage test. I'm already running it in sdev, any suggestions?

leepc12 commented 7 years ago

Did you try with enough memory > 15G (sdev --mem=15000)?

Can you also check if you can access to my scratch directory /scratch/users/leepc12? If so, I can make a copy of all genome data on it.

Jin

On Sun, Feb 5, 2017 at 5:20 PM, rleylek notifications@github.com wrote:

OK, this makes sense - I thought I did the step to install genome data earlier, but looks like it didn't complete properly.

While "Building bowtie2 index" it keeps getting killed at the ahead-of-time memory usage test. I'm already running it in sdev, any suggestions?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kundajelab/atac_dnase_pipelines/issues/31#issuecomment-277566372, or mute the thread https://github.com/notifications/unsubscribe-auth/AIOd_L4XbnVJEcJEIxkBg4ZqXdSTV2szks5rZnVAgaJpZM4L3oDY .

leepc12 commented 7 years ago

Please don't install genome database. Just git pull the latest pipeline and try running pipelines again. I moved the genome database to the scratch dir which is open to public.

Jin

On Sun, Feb 5, 2017 at 5:30 PM, Jin leepc12@gmail.com wrote:

Did you try with enough memory > 15G (sdev --mem=15000)?

Can you also check if you can access to my scratch directory /scratch/users/leepc12? If so, I can make a copy of all genome data on it.

Jin

On Sun, Feb 5, 2017 at 5:20 PM, rleylek notifications@github.com wrote:

OK, this makes sense - I thought I did the step to install genome data earlier, but looks like it didn't complete properly.

While "Building bowtie2 index" it keeps getting killed at the ahead-of-time memory usage test. I'm already running it in sdev, any suggestions?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/kundajelab/atac_dnase_pipelines/issues/31#issuecomment-277566372, or mute the thread https://github.com/notifications/unsubscribe-auth/AIOd_L4XbnVJEcJEIxkBg4ZqXdSTV2szks5rZnVAgaJpZM4L3oDY .

rleylek commented 7 years ago

That fixed it - thanks so much!