kundajelab / chipseq_pipeline

AQUAS TF and histone ChIP-seq pipeline
BSD 3-Clause "New" or "Revised" License
105 stars 40 forks source link

Fatal error: /TF_chipseq_pipeline/chipseq.bds, line 1529, pos 2. Task/s failed. #20

Closed yingsun-ucsd closed 7 years ago

yingsun-ucsd commented 7 years ago

I am running chipseq_pipeline on our data and the following two commands have been used. The first one has no problems but the second one cannot go through with the error message I attached below too. Please help/advise. Thanks a lot.

COMMANDS:

bds /home/TF_chipseq_pipeline/chipseq.bds -species hg38 -nth 4 -out_dir FOXA2GAII -se -fastq1 SRR074923.fastq.gz -fastq2 SRR074926.fastq.gz -ctl_fastq1 SRR074928.fastq.gz (GOOD)

bds /home/TF_chipseq_pipeline/chipseq.bds -species hg38 -nth 4 -out_dir FOXA2GAI -se -fastq1:1 SRR074921.fastq.gz -fastq1:2 SRR074922.fastq.gz -fastq2:1 SRR074924.fastq.gz -fastq2:2 SRR074925.fastq.gz -ctl_fastq1 SRR074927.fastq.gz -ctl_fastq2 SRR074929.fastq.gz (ERROR)

ERROR:

== git info Latest git commit : eb3e2369b0397856d616369acdd06f114bf75ff3 (Thu Aug 3 20:12:41 2017) Reading parameters from section (default) in file(/home/ysun/TF_chipseq_pipeline/default.env)...

== configuration file info Hostname : **** Configuration file : Environment file : /home/ysun/TF_chipseq_pipeline/default.env

== parallelization info No parallel jobs : false Maximum # threads : 4

== cluster/system info Walltime (general) : 5h50m Max. memory (general) : 7G Force to use a system : local Process priority (niceness) : 0 Retiral for failed tasks : 0 Submit tasks to a cluster queue : Unlimited cluster mem./walltime : false

== shell environment info Conda env. : aquas_chipseq Conda env. for python3 : aquas_chipseq_py3 Conda bin. directory :

Shell cmd. for init. : if [[ -f $(which conda) && $(conda env list | grep aquas_chipseq | wc -l) != "0" ]]; then source activate aquas_chipseq; sleep 5; fi; export PATH=/home/ysun/TF_chipseq_pipeline/.:/home/ysun/TF_chipseq_pipeline/modules:/home/ysun/TF_chi pseq_pipeline/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)

Shell cmd. for init.(py3) : if [[ -f $(which conda) && $(conda env list | grep aquas_chipseq_py3 | wc -l) != "0" ]]; then source activate aquas_chipseq_py3; sleep 5; fi; export PATH=/home/ysun/TF_chipseq_pipeline/.:/home/ysun/TF_chipseq_pipeline/modules:/home/ysu n/TF_chipseq_pipeline/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)

Shell cmd. for fin. : TASKTIME=$[$(date +%s)-${STARTTIME}]; echo "Task has finished (${TASKTIME} seconds)."; sleep 0

Cluster task min. len. : 60

Cluster task delay : 0

== output directory/title info Output dir. : /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI Title (prefix) : FOXA2GAI Reading parameters from section (default) in file(/home/ysun/TF_chipseq_pipeline/default.env)... Reading parameters from section (hg38) in file(/home/ysun/bds_pipeline_genome_data/aquas_chipseq_species.conf)...

== species settings Species : hg38 Species file : /home/ysun/bds_pipeline_genome_data/aquas_chipseq_species.conf

Species name (WashU browser) : hg38 Ref. genome seq. fasta : /home/ysun/bds_pipeline_genome_data/hg38/GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta Chr. sizes file : /home/ysun/bds_pipeline_genome_data/hg38/hg38.chrom.sizes Black list bed : /home/ysun/bds_pipeline_genome_data/hg38/hg38.blacklist.bed.gz Ref. genome seq. dir. :

== ENCODE accession settings ENCODE experiment accession : ENCODE award RFA : ENCODE assay category : ENCODE assay title : ENCODE award : ENCODE lab : ENCODE assembly genome : ENCODE alias prefix : KLAB_PIPELINE

== report settings URL root for output directory : Genome coord. for browser tracks :

== align bwa settings Param. for bwa : -q 5 -l 32 -k 2 BWA index : /home/ysun/bds_pipeline_genome_data/hg38/bwa_index/GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta Walltime (bwa) : 47h Max. memory (bwa) : 12G

== align multimapping settings

alignments reported for multimapping : 0

== postalign bam settings MAPQ reads rm thresh. : 30 Rm. tag reads with str. : No dupe removal in filtering raw bam : false Walltime (bam filter) : 23h Max. memory (bam filter) : 12G Dup marker : picard Use sambamba markdup (instead of picard) : false

== postalign bed/tagalign settings Max. memory for UNIX shuf : 12G

== postalign cross-corr. analysis settings Max. memory for UNIX shuf : 12G User-defined cross-corr. peak strandshift : -1 Extra parameters for cross-corr. analysis :

== callpeak spp settings Threshold for # peak : 300000 Walltime (spp) : 47h Max. memory (spp) : 12G Stack size for run_spp.R : Use-defined cross-corr. peak strandshift; if -1, use frag. len. :-1 Extra parameters for run_spp.R :

== callpeak gem settings Threshold for # peak in GEM : 300000 Min. length of k-mers in GEM : 6 Max. length of k-mers in GEM : 13 Q-value threshold for GEM : 0.0 Read distribution txt for GEM : /home/ysun/TF_chipseq_pipeline/etc/Read_Distribution_default.txt Extra parameters for GEM : Walltime (GEM) : 47h Max. memory (GEM) : 15G

== callpeak PeakSeq settings Target FDR for PeakSeq :0.05 Number of simulations for PeakSeq :10 Enrichment mapped frag. len. for PeakSeq :-1 Minimum interpeak distance for PeakSeq :-1 Mappability map file for PeakSeq : Maximum Q-value for PeakSeq :0.1 Background model for PeakSeq :Simulated Extra parameters for PeakSeq : Walltime (PeakSeq) : 47h Max. memory (PeakSeq) : 12G

== callpeak macs2 settings Genome size (hs,mm) : hs Walltime (macs2) : 23h Max. memory (macs2) : 15G P-value cutoff (macs2 callpeak) : 0.01 --keep-dup (macs2 callpeak) : all --extsize (macs2 callpeak); if -1 then use frag. len. : -1 --shift (macs2 callpeak) : 0 Extra parameters for macs2 callpeak :

== callpeak naiver overlap settings Bedtools intersect -nonamecheck : false

== IDR settings Append IDR threshold to IDR out_dir : false

== chipseq pipeline settings

replicates : 2

Type of ChIP-Seq pipeline : TF Final stage for ChIP-Seq : Signal tracks for pooled rep. only : false Aligner to map raw reads : bwa

reads to subsample for cross-corr. analysis : 15000000

reads to subsample exp. replicates (0: no subsampling): 0

reads to subsample controls (0: no subsampling) : 0

Generate anonymized filt. bam : false Peak caller for IDR analysis : spp Control rep. depth ratio : 1.2 Scoring column for IDR : signal.value IDR threshold : 0.05 Force to use pooled ctl : false Peak calling for true reps only : false No peak calling for self pseudo reps : false Disable cross-correlation analysis : false Disable g. peak filt. thru. n. peak : false Disable genome browser tracks : false

== checking chipseq parameters ...

== checking input files ...

Rep1 fastq (SE) : /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/SRR074921.fastq.gz /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/SRR074922.fastq.gz Rep2 fastq (SE) : /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/SRR074924.fastq.gz /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/SRR074925.fastq.gz Control Rep1 fastq (SE) : /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/SRR074927.fastq.gz Control Rep2 fastq (SE) : /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/SRR074929.fastq.gz Distributing 4 to ... {rep2=1, rep1=1, ctl1=1, ctl2=1}

== Done align()

== Done pool_tags()

Fewer reads in control 1 than experiment replicate 1. Using pooled controls for replicate 1.

Fewer reads in control 2 than experiment replicate 2. Using pooled controls for replicate 2. Distributing 4 to ... [1, 1, 1, 1]

== Done call_peaks()

== Done naive_overlap() Task failed: Program & line : '/home/ysun/TF_chipseq_pipeline/modules/callpeak_idr.bds', line 74 Task Name : 'idr2 rep2-pr' Task ID : 'chipseq.bds.20170920_084412_704/task.callpeak_idr.idr2_rep2_pr.line_74.id_38' Task PID : '30964' Task hint : 'idr --samples /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/pseudo_reps/rep2/pr1/SRR074924_SRR074925.nodup.pr1.tagAlign_x_SRR074927.nodup_SRR074929.nodup.tagAlign.regionPeak.gz /home/ysun/published/GEO/GSE25836_Soccio _MolEndocr/FOXA2GAI/peak/spp/pseudoreps/rep2/pr2/SRR074924' Task resources : 'cpus: -1 mem: -1.0 B wall-timeout: 8640000' State : 'ERROR' Dependency state : 'ERROR' Retries available : '1' Input files : '[/home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/pseudo_reps/rep2/pr1/SRR074924_SRR074925.nodup.pr1.tagAlign_x_SRR074927.nodup_SRR074929.nodup.tagAlign.regionPeak.gz, /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/F OXA2GAI/peak/spp/pseudo_reps/rep2/pr2/SRR074924_SRR074925.nodup.pr2.tagAlign_x_SRR074927.nodup_SRR074929.nodup.tagAlign.regionPeak.gz, /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/rep2/SRR074924_SRR074925.nodup.tagAlign_x_SRR074927.nodup_SRR07492 9.nodup.tagAlign.regionPeak.gz]' Output files : '[/home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.filt.narrowPeak.gz, /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2- pr.unthresholded-peaks.txt.png, /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.log.txt, /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.unthres holded-peaks.txt.gz, /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.filt.12-col.bed.gz]' Script file : '/home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/chipseq.bds.20170920_084412_704/task.callpeak_idr.idr2_rep2_pr.line_74.id_38.sh' Exit status : '1' Program :

    # SYS command. line 76

     if [[ -f $(which conda) && $(conda env list | grep aquas_chipseq_py3 | wc -l) != "0" ]]; then source activate aquas_chipseq_py3; sleep 5; fi;  export PATH=/home/ysun/TF_chipseq_pipeline/.:/home/ysun/TF_chipseq_pipeline/modules:/home/ysun/TF_chipseq_pipe

line/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)

    # SYS command. line 78

     idr --samples /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/pseudo_reps/rep2/pr1/SRR074924_SRR074925.nodup.pr1.tagAlign_x_SRR074927.nodup_SRR074929.nodup.tagAlign.regionPeak.gz /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FO

XA2GAI/peak/spp/pseudo_reps/rep2/pr2/SRR074924_SRR074925.nodup.pr2.tagAlign_x_SRR074927.nodup_SRR074929.nodup.tagAlign.regionPeak.gz --peak-list /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/rep2/SRR074924_SRR074925.nodup.tagAlign_x_SRR074927.nodu p_SRR074929.nodup.tagAlign.regionPeak.gz --input-file-type narrowPeak \ --output-file /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.unthresholded-peaks.txt --rank signal.value --soft-idr-threshold 0.05 \ --plot --use-best-multisummit-IDR --log-output-file /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.log.txt

    # SYS command. line 82

     idr_thresh_transformed=$(awk -v p=0.05 'BEGIN{print -log(p)/log(10)}')

    # SYS command. line 85

     awk 'BEGIN{OFS="\t"} $12>='"${idr_thresh_transformed}"' {if ($2<0) $2=0; print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,"0"}' /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.unthresholded-peaks.tx

t \ | sort | uniq | sort -k7n,7n | gzip -nc > /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.13-col.bed.gz

    # SYS command. line 88

     zcat /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.13-col.bed.gz | awk 'BEGIN{OFS="\t"} {print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10}' | gzip -nc > /home/ysun/published/GEO/GSE25836_Soccio_Mo

lEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.narrowPeak.gz

    # SYS command. line 89

     zcat /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.13-col.bed.gz | awk 'BEGIN{OFS="\t"} {print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12}' | gzip -nc > /home/ysun/published/GEO/GSE25836_S

occio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.12-col.bed.gz

    # SYS command. line 91

     bedtools intersect -v -a <(zcat -f /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.13-col.bed.gz) -b <(zcat -f /home/ysun/bds_pipeline_genome_data/hg38/hg38.blacklist.bed.gz) | grep -P '

chr[\dXY]+[ \t]' | awk 'BEGIN{OFS="\t"} {if ($5>1000) $5=1000; print $0}' | gzip -nc > /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.filt.13-col.bed.gz

    # SYS command. line 92

     zcat /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.filt.13-col.bed.gz | awk 'BEGIN{OFS="\t"} {print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10}' | gzip -nc > /home/ysun/published/GEO/GSE25836_Socc

io_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.filt.narrowPeak.gz

    # SYS command. line 93

     zcat /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.filt.13-col.bed.gz | awk 'BEGIN{OFS="\t"} {print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12}' | gzip -nc > /home/ysun/published/GEO/GSE25

836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.filt.12-col.bed.gz

    # SYS command. line 95

     gzip -f /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.unthresholded-peaks.txt

    # SYS command. line 96

     rm -f /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.13-col.bed.gz /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.fil

t.13-col.bed.gz

    # SYS command. line 98

     TASKTIME=$[$(date +%s)-${STARTTIME}]; echo "Task has finished (${TASKTIME} seconds)."; sleep 0

Fatal error: /home/ysun/TF_chipseq_pipeline/chipseq.bds, line 1529, pos 2. Task/s failed. chipseq.bds, line 79 : main() chipseq.bds, line 82 : void main() { // chipseq pipeline starts here chipseq.bds, line 102 : do_idr() // IDR chipseq.bds, line 1465 : void do_idr() { chipseq.bds, line 1529 : wait

Creating checkpoint file: Config or command line option disabled checkpoint file creation, nothing done.

leepc12 commented 7 years ago

Can you check how many peaks have passed IDR threshold of 0.05?

$ cat /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.narrowPeak.gz | wc -l

$ cat  /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.unthresholded-peaks.txt | wc -l
yingsun-ucsd commented 7 years ago

0 and 207

If I stopped it at "peak" as the "final_stage" it worked as below: bds /TF_chipseq_pipeline/chipseq.bds -final_stage peak -no_pseudo_rep -peak_caller macs2 -species hg38 -nth 4 -out_dir FOXA2GAI -se -fastq1:1 SRR074921.fastq.gz -fastq1:2 SRR074922.fastq.gz -fastq2:1 SRR074924.fastq.gz -fastq2:2 SRR074925.fastq.gz -ctl_fastq1 SRR074927.fastq.gz -ctl_fastq2 SRR074929.fastq.gz

Would you also recommend some paper or readings or more detailed manual with examples for ChIPseq_pipeline and ATACseq_pipeline too? I only worked with RNAseq data before and don't have much experience on other sequencing analysis. Thanks.

leepc12 commented 7 years ago

Looks like no peaks (0 out of 207) have passed IDR threshold of 0.05. Can you try with more relaxed one like 0.1?

You need to take a look at the following README files in both pipeline repos. They has everything you want.

https://github.com/kundajelab/chipseq_pipeline/blob/master/README.md https://github.com/kundajelab/atac_dnase_pipelines/blob/master/README.md

yingsun-ucsd commented 7 years ago

I did read both. I will read more carefully. Thanks. Will try your suggestion too.

akundaje commented 7 years ago

This points to the data being low quality.

Anshul.

On Sep 20, 2017 9:28 AM, "yingsun-ucsd" notifications@github.com wrote:

0 and 207

If I stopped it at "peak" as the "final_stage" it worked.

Would you also recommend some paper or readings or more detailed manual with examples for ChIPseq_pipeline and ATACseq_pipeline too? I only worked with RNAseq data before and don't have much experience on other sequencing analysis. Thanks.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kundajelab/chipseq_pipeline/issues/20#issuecomment-330907136, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI7EZzPfqLVudngEeXoVTuRIGcJcwPJks5skT0RgaJpZM4PeGw5 .