Closed yingsun-ucsd closed 7 years ago
Can you check how many peaks have passed IDR threshold of 0.05?
$ cat /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.narrowPeak.gz | wc -l
$ cat /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.unthresholded-peaks.txt | wc -l
0 and 207
If I stopped it at "peak" as the "final_stage" it worked as below: bds /TF_chipseq_pipeline/chipseq.bds -final_stage peak -no_pseudo_rep -peak_caller macs2 -species hg38 -nth 4 -out_dir FOXA2GAI -se -fastq1:1 SRR074921.fastq.gz -fastq1:2 SRR074922.fastq.gz -fastq2:1 SRR074924.fastq.gz -fastq2:2 SRR074925.fastq.gz -ctl_fastq1 SRR074927.fastq.gz -ctl_fastq2 SRR074929.fastq.gz
Would you also recommend some paper or readings or more detailed manual with examples for ChIPseq_pipeline and ATACseq_pipeline too? I only worked with RNAseq data before and don't have much experience on other sequencing analysis. Thanks.
Looks like no peaks (0 out of 207) have passed IDR threshold of 0.05. Can you try with more relaxed one like 0.1?
You need to take a look at the following README files in both pipeline repos. They has everything you want.
https://github.com/kundajelab/chipseq_pipeline/blob/master/README.md https://github.com/kundajelab/atac_dnase_pipelines/blob/master/README.md
I did read both. I will read more carefully. Thanks. Will try your suggestion too.
This points to the data being low quality.
Anshul.
On Sep 20, 2017 9:28 AM, "yingsun-ucsd" notifications@github.com wrote:
0 and 207
If I stopped it at "peak" as the "final_stage" it worked.
Would you also recommend some paper or readings or more detailed manual with examples for ChIPseq_pipeline and ATACseq_pipeline too? I only worked with RNAseq data before and don't have much experience on other sequencing analysis. Thanks.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kundajelab/chipseq_pipeline/issues/20#issuecomment-330907136, or mute the thread https://github.com/notifications/unsubscribe-auth/AAI7EZzPfqLVudngEeXoVTuRIGcJcwPJks5skT0RgaJpZM4PeGw5 .
I am running chipseq_pipeline on our data and the following two commands have been used. The first one has no problems but the second one cannot go through with the error message I attached below too. Please help/advise. Thanks a lot.
COMMANDS:
bds /home/TF_chipseq_pipeline/chipseq.bds -species hg38 -nth 4 -out_dir FOXA2GAII -se -fastq1 SRR074923.fastq.gz -fastq2 SRR074926.fastq.gz -ctl_fastq1 SRR074928.fastq.gz (GOOD)
bds /home/TF_chipseq_pipeline/chipseq.bds -species hg38 -nth 4 -out_dir FOXA2GAI -se -fastq1:1 SRR074921.fastq.gz -fastq1:2 SRR074922.fastq.gz -fastq2:1 SRR074924.fastq.gz -fastq2:2 SRR074925.fastq.gz -ctl_fastq1 SRR074927.fastq.gz -ctl_fastq2 SRR074929.fastq.gz (ERROR)
ERROR:
== git info Latest git commit : eb3e2369b0397856d616369acdd06f114bf75ff3 (Thu Aug 3 20:12:41 2017) Reading parameters from section (default) in file(/home/ysun/TF_chipseq_pipeline/default.env)...
== configuration file info Hostname : **** Configuration file : Environment file : /home/ysun/TF_chipseq_pipeline/default.env
== parallelization info No parallel jobs : false Maximum # threads : 4
== cluster/system info Walltime (general) : 5h50m Max. memory (general) : 7G Force to use a system : local Process priority (niceness) : 0 Retiral for failed tasks : 0 Submit tasks to a cluster queue : Unlimited cluster mem./walltime : false
== shell environment info Conda env. : aquas_chipseq Conda env. for python3 : aquas_chipseq_py3 Conda bin. directory :
Shell cmd. for init. : if [[ -f $(which conda) && $(conda env list | grep aquas_chipseq | wc -l) != "0" ]]; then source activate aquas_chipseq; sleep 5; fi; export PATH=/home/ysun/TF_chipseq_pipeline/.:/home/ysun/TF_chipseq_pipeline/modules:/home/ysun/TF_chi pseq_pipeline/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)
Shell cmd. for init.(py3) : if [[ -f $(which conda) && $(conda env list | grep aquas_chipseq_py3 | wc -l) != "0" ]]; then source activate aquas_chipseq_py3; sleep 5; fi; export PATH=/home/ysun/TF_chipseq_pipeline/.:/home/ysun/TF_chipseq_pipeline/modules:/home/ysu n/TF_chipseq_pipeline/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)
Shell cmd. for fin. : TASKTIME=$[$(date +%s)-${STARTTIME}]; echo "Task has finished (${TASKTIME} seconds)."; sleep 0
Cluster task min. len. : 60
Cluster task delay : 0
== output directory/title info Output dir. : /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI Title (prefix) : FOXA2GAI Reading parameters from section (default) in file(/home/ysun/TF_chipseq_pipeline/default.env)... Reading parameters from section (hg38) in file(/home/ysun/bds_pipeline_genome_data/aquas_chipseq_species.conf)...
== species settings Species : hg38 Species file : /home/ysun/bds_pipeline_genome_data/aquas_chipseq_species.conf
Species name (WashU browser) : hg38 Ref. genome seq. fasta : /home/ysun/bds_pipeline_genome_data/hg38/GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta Chr. sizes file : /home/ysun/bds_pipeline_genome_data/hg38/hg38.chrom.sizes Black list bed : /home/ysun/bds_pipeline_genome_data/hg38/hg38.blacklist.bed.gz Ref. genome seq. dir. :
== ENCODE accession settings ENCODE experiment accession : ENCODE award RFA : ENCODE assay category : ENCODE assay title : ENCODE award : ENCODE lab : ENCODE assembly genome : ENCODE alias prefix : KLAB_PIPELINE
== report settings URL root for output directory : Genome coord. for browser tracks :
== align bwa settings Param. for bwa : -q 5 -l 32 -k 2 BWA index : /home/ysun/bds_pipeline_genome_data/hg38/bwa_index/GRCh38_no_alt_analysis_set_GCA_000001405.15.fasta Walltime (bwa) : 47h Max. memory (bwa) : 12G
== align multimapping settings
alignments reported for multimapping : 0
== postalign bam settings MAPQ reads rm thresh. : 30 Rm. tag reads with str. : No dupe removal in filtering raw bam : false Walltime (bam filter) : 23h Max. memory (bam filter) : 12G Dup marker : picard Use sambamba markdup (instead of picard) : false
== postalign bed/tagalign settings Max. memory for UNIX shuf : 12G
== postalign cross-corr. analysis settings Max. memory for UNIX shuf : 12G User-defined cross-corr. peak strandshift : -1 Extra parameters for cross-corr. analysis :
== callpeak spp settings Threshold for # peak : 300000 Walltime (spp) : 47h Max. memory (spp) : 12G Stack size for run_spp.R : Use-defined cross-corr. peak strandshift; if -1, use frag. len. :-1 Extra parameters for run_spp.R :
== callpeak gem settings Threshold for # peak in GEM : 300000 Min. length of k-mers in GEM : 6 Max. length of k-mers in GEM : 13 Q-value threshold for GEM : 0.0 Read distribution txt for GEM : /home/ysun/TF_chipseq_pipeline/etc/Read_Distribution_default.txt Extra parameters for GEM : Walltime (GEM) : 47h Max. memory (GEM) : 15G
== callpeak PeakSeq settings Target FDR for PeakSeq :0.05 Number of simulations for PeakSeq :10 Enrichment mapped frag. len. for PeakSeq :-1 Minimum interpeak distance for PeakSeq :-1 Mappability map file for PeakSeq : Maximum Q-value for PeakSeq :0.1 Background model for PeakSeq :Simulated Extra parameters for PeakSeq : Walltime (PeakSeq) : 47h Max. memory (PeakSeq) : 12G
== callpeak macs2 settings Genome size (hs,mm) : hs Walltime (macs2) : 23h Max. memory (macs2) : 15G P-value cutoff (macs2 callpeak) : 0.01 --keep-dup (macs2 callpeak) : all --extsize (macs2 callpeak); if -1 then use frag. len. : -1 --shift (macs2 callpeak) : 0 Extra parameters for macs2 callpeak :
== callpeak naiver overlap settings Bedtools intersect -nonamecheck : false
== IDR settings Append IDR threshold to IDR out_dir : false
== chipseq pipeline settings
replicates : 2
Type of ChIP-Seq pipeline : TF Final stage for ChIP-Seq : Signal tracks for pooled rep. only : false Aligner to map raw reads : bwa
reads to subsample for cross-corr. analysis : 15000000
reads to subsample exp. replicates (0: no subsampling): 0
reads to subsample controls (0: no subsampling) : 0
Generate anonymized filt. bam : false Peak caller for IDR analysis : spp Control rep. depth ratio : 1.2 Scoring column for IDR : signal.value IDR threshold : 0.05 Force to use pooled ctl : false Peak calling for true reps only : false No peak calling for self pseudo reps : false Disable cross-correlation analysis : false Disable g. peak filt. thru. n. peak : false Disable genome browser tracks : false
== checking chipseq parameters ...
== checking input files ...
Rep1 fastq (SE) : /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/SRR074921.fastq.gz /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/SRR074922.fastq.gz Rep2 fastq (SE) : /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/SRR074924.fastq.gz /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/SRR074925.fastq.gz Control Rep1 fastq (SE) : /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/SRR074927.fastq.gz Control Rep2 fastq (SE) : /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/SRR074929.fastq.gz Distributing 4 to ... {rep2=1, rep1=1, ctl1=1, ctl2=1}
== Done align()
== Done pool_tags()
Fewer reads in control 1 than experiment replicate 1. Using pooled controls for replicate 1.
Fewer reads in control 2 than experiment replicate 2. Using pooled controls for replicate 2. Distributing 4 to ... [1, 1, 1, 1]
== Done call_peaks()
== Done naive_overlap() Task failed: Program & line : '/home/ysun/TF_chipseq_pipeline/modules/callpeak_idr.bds', line 74 Task Name : 'idr2 rep2-pr' Task ID : 'chipseq.bds.20170920_084412_704/task.callpeak_idr.idr2_rep2_pr.line_74.id_38' Task PID : '30964' Task hint : 'idr --samples /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/pseudo_reps/rep2/pr1/SRR074924_SRR074925.nodup.pr1.tagAlign_x_SRR074927.nodup_SRR074929.nodup.tagAlign.regionPeak.gz /home/ysun/published/GEO/GSE25836_Soccio _MolEndocr/FOXA2GAI/peak/spp/pseudoreps/rep2/pr2/SRR074924' Task resources : 'cpus: -1 mem: -1.0 B wall-timeout: 8640000' State : 'ERROR' Dependency state : 'ERROR' Retries available : '1' Input files : '[/home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/pseudo_reps/rep2/pr1/SRR074924_SRR074925.nodup.pr1.tagAlign_x_SRR074927.nodup_SRR074929.nodup.tagAlign.regionPeak.gz, /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/F OXA2GAI/peak/spp/pseudo_reps/rep2/pr2/SRR074924_SRR074925.nodup.pr2.tagAlign_x_SRR074927.nodup_SRR074929.nodup.tagAlign.regionPeak.gz, /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/rep2/SRR074924_SRR074925.nodup.tagAlign_x_SRR074927.nodup_SRR07492 9.nodup.tagAlign.regionPeak.gz]' Output files : '[/home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.filt.narrowPeak.gz, /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2- pr.unthresholded-peaks.txt.png, /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.log.txt, /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.unthres holded-peaks.txt.gz, /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.filt.12-col.bed.gz]' Script file : '/home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/chipseq.bds.20170920_084412_704/task.callpeak_idr.idr2_rep2_pr.line_74.id_38.sh' Exit status : '1' Program :
line/utils:${PATH}:/bin:/usr/bin:/usr/local/bin:${HOME}/.bds; set -o pipefail; STARTTIME=$(date +%s)
XA2GAI/peak/spp/pseudo_reps/rep2/pr2/SRR074924_SRR074925.nodup.pr2.tagAlign_x_SRR074927.nodup_SRR074929.nodup.tagAlign.regionPeak.gz --peak-list /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/rep2/SRR074924_SRR074925.nodup.tagAlign_x_SRR074927.nodu p_SRR074929.nodup.tagAlign.regionPeak.gz --input-file-type narrowPeak \ --output-file /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.unthresholded-peaks.txt --rank signal.value --soft-idr-threshold 0.05 \ --plot --use-best-multisummit-IDR --log-output-file /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.log.txt
t \ | sort | uniq | sort -k7n,7n | gzip -nc > /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.13-col.bed.gz
lEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.narrowPeak.gz
occio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.12-col.bed.gz
chr[\dXY]+[ \t]' | awk 'BEGIN{OFS="\t"} {if ($5>1000) $5=1000; print $0}' | gzip -nc > /home/ysun/published/GEO/GSE25836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.filt.13-col.bed.gz
io_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.filt.narrowPeak.gz
836_Soccio_MolEndocr/FOXA2GAI/peak/spp/idr/pseudo_reps/rep2/FOXA2GAI_rep2-pr.IDR0.05.filt.12-col.bed.gz
t.13-col.bed.gz
Fatal error: /home/ysun/TF_chipseq_pipeline/chipseq.bds, line 1529, pos 2. Task/s failed. chipseq.bds, line 79 : main() chipseq.bds, line 82 : void main() { // chipseq pipeline starts here chipseq.bds, line 102 : do_idr() // IDR chipseq.bds, line 1465 : void do_idr() { chipseq.bds, line 1529 : wait
Creating checkpoint file: Config or command line option disabled checkpoint file creation, nothing done.