shawnzhangyx / PePr

a peak-calling and differential analysis tool for replicated ChIP-Seq data
GNU General Public License v3.0
37 stars 9 forks source link

ATAC-seq/DNase-seq Analysis #21

Open robertamezquita opened 8 years ago

robertamezquita commented 8 years ago

Is there a way to run PePr without specifying input files, as is the case for analyzing ATAC-seq/DNase-seq data?

shawnzhangyx commented 8 years ago

The only way to run PePr without input files is to run in the differential binding mode, where one ChIP file is used as a control for the other ChIP file. In peak-calling mode, there has to be at least one input sample. You might be able to find some public input data for your samples if it's from a cell line.

Actually this is a very good point. I might be able to tweak the current model a little bit to derive a model for peak-calling without any input files. But before I implement that, unfortunately there will be no easy work-around. You could try looking for other software that could do it as well.

robertamezquita commented 8 years ago

Yeah, this definitely would be a great feature to have for such data that doesn't typically have input samples, but I suppose a ChIP-seq input file where there's essentially little to no signal could do the trick? Thanks for your input!

shawnzhangyx commented 8 years ago

Yeah, I think any ChIP-seq input file for the same organism will work (best if it is from the same cell line or tissue).

c-guzman commented 7 years ago

I just wanted to leave a comment here to encourage that this sort of feature be looked into. Would be super awesome to use this for DNase-seq and ATAC-seq datasets. I agree that regular input would work, but I don't really know if it's 'standard'.

Love the software. Thanks.

shawnzhangyx commented 7 years ago

Definitely. I have started analyzing atac-seq data lately. Will for sure try to adapt it to atacseq and DNase once I understand the characteristics of the data better.

On Tue, Apr 11, 2017 at 3:07 PM Carlos Guzman notifications@github.com wrote:

I just wanted to leave a comment here to encourage that this sort of feature be looked into. Would be super awesome to use this for DNase-seq and ATAC-seq datasets. I agree that regular input would work, but I don't really know if it's 'standard'.

Love the software. Thanks.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/shawnzhangyx/PePr/issues/21#issuecomment-293369432, or mute the thread https://github.com/notifications/unsubscribe-auth/AFOG6FHOBGrPoTwrQIJ8A43qC_mLt2jbks5ru8-EgaJpZM4JVX_9 .

-- Sent from Gmail Mobile

firatuyulur commented 7 years ago

My post is not directly related to this issue, but there is an important overlap. I am working with 10 atac-seq bams. 2 replicates for 5 conditions. I ran peakcalling using another tool with parameters that fit atac-seq. When I put them on IGV, I realized that the bam average coverages were different within replicates and across samples. I found your tool, thankfully, and I think I can rule out the coverage inconsistency with "--normalize" parameter. Do you agree so?

shawnzhangyx commented 7 years ago

@firatuyulur Yeah you're right. You can do two-condition comparison without "input files". For normalization, the following option will work: --normalization=inter-group.

firatuyulur commented 7 years ago

PePr -c modified_del_sorted_dh1f_1.broad_treat_pileup.bed,modified_del_sorted_dh1f_2.broad_treat_pileup.bed --chip2 modified_del_sorted_iPSC_1.broad_treat_pileup.bed,modified_del_sorted_iPSC_2.broad_treat_pileup.bed -f bed --normalization=inter-group --diff --output-directory PePr_Output is my command for ATAC-Seq inter group normalization. It raises an error when comes to estimating shift sizes.

INFO root 06/12/2017 10:42:00 AM | getting chromosome info INFO root 06/12/2017 10:42:14 AM | Found chr15:101979970 | chr22:50808225 | chr18:80262135 | chr17:83247293 | chrX:156030480 | chr6:170742741 | chrY:56886715 | chr1:248946112 | chr5:181475363 | chr8:145075748 | chr2:242183304 | chr9:138331219 | chr3:198233526 | chr4:190204237 | chr11:135071089 | chr19:58606580 | chr21:46699480 | chr16:90224624 | chr13:114353911 | chr10:133787262 | chr12:133264414 | chr20:64329978 | chr7:159331657 | chr14:106883535 INFO root 06/12/2017 10:42:14 AM | Read length: 63990|52942|56364|60913 INFO root 06/12/2017 10:42:14 AM | begin estimating window size INFO root 06/12/2017 10:44:28 AM | Estimated window size is 860 INFO root 06/12/2017 10:44:28 AM | estimating the shift size for modified_del_sorted_dh1f_1.broad_treat_pileup.bed Traceback (most recent call last): File "/home/firat/anaconda3/bin/PePr", line 11, in sys.exit(argless_main()) File "/home/firat/anaconda3/lib/python3.5/site-packages/PePr/PePr.py", line 27, in argless_main main(sys.argv) File "/home/firat/anaconda3/lib/python3.5/site-packages/PePr/PePr.py", line 36, in main initialize.preprocess(parameter) File "/home/firat/anaconda3/lib/python3.5/site-packages/PePr/pre_processing/initialize.py", line 29, in preprocess shiftSize.estimate_shiftsizes(parameter) File "/home/firat/anaconda3/lib/python3.5/site-packages/PePr/pre_processing/shiftSize.py", line 20, in estimate_shiftsizes shift_size, bin_array = estimate_shiftsize(chip, parameter)

File "/home/firat/anaconda3/lib/python3.5/site-packages/PePr/pre_processing/shiftSize.py", line 64, in estimate_shiftsize @ chr_f,chr_r = forward[chr],reverse[chr] KeyError: 'chr1'
I will be looking into it but wanted to post this here too

shawnzhangyx commented 7 years ago

@firatuyulur What is the format of your bed filles? Could you paste the first few lines of it?

firatuyulur commented 7 years ago

Hi @shawnzhangyx, I believe putting an example bed format could be a good idea for future. I have tried a couple of bed formats but the error remains.

chr1 29212 29497 iPSC/ipsc_1.broad_peak_1 40 +

chr1 199548 200210 iPSC/ipsc_1.broad_peak_2 24 +

chr1 206611 207291 iPSC/ipsc_1.broad_peak_3 13 +

chr1 267861 268061 iPSC/ipsc_1.broad_peak_4 23 +

chr1 381148 381531 iPSC/ipsc_1.broad_peak_5 16 +

chr1 610285 610894 iPSC/ipsc_1.broad_peak_6 12 +

and I think I have tried one without "chr" and got the error again. I also tried one with

chr1 10336 11416 dh1f/dh1f_1.broad_peak_1 22 + 10336 11416 255,0,0

chr1 28729 29752 dh1f/dh1f_1.broad_peak_2 58 + 28729 29752 255,0,0

chr1 91278 91542 dh1f/dh1f_1.broad_peak_3 42 + 91278 91542 255,0,0

chr1 143749 144115 dh1f/dh1f_1.broad_peak_4 57 + 143749 144115 255,0,0

chr1 178994 179474 dh1f/dh1f_1.broad_peak_5 22 + 178994 179474 255,0,0

This crowded bed gave too many values to unpack error.

Right now my aim is to give 4 atac-seq beds (2 replicates for 2 samples each) and get the differential binding results. I did not really modify any parameter;

PePr -c dh1f_1.bed,dh1f_2.bed --chip2 ipsc_1.bed,ipsc_2.bed -f bed --diff

error is the same but I will repost it anyways

INFO root 06/15/2017 01:47:29 PM | estimating the shift size for dh1f_1.bed Traceback (most recent call last): File "/home/firat/anaconda2/bin/PePr", line 11, in sys.exit(argless_main()) File "/home/firat/anaconda2/lib/python2.7/site-packages/PePr/PePr.py", line 27, in argless_main main(sys.argv) File "/home/firat/anaconda2/lib/python2.7/site-packages/PePr/PePr.py", line 36, in main initialize.preprocess(parameter) File "/home/firat/anaconda2/lib/python2.7/site-packages/PePr/pre_processing/initialize.py", line 29, in preprocess shiftSize.estimate_shiftsizes(parameter) File "/home/firat/anaconda2/lib/python2.7/site-packages/PePr/pre_processing/shiftSize.py", line 20, in estimate_shiftsizes shift_size, bin_array = estimate_shiftsize(chip, parameter) File "/home/firat/anaconda2/lib/python2.7/site-packages/PePr/pre_processing/shiftSize.py", line 64, in estimate_shiftsize chr_f,chr_r = forward[chr],reverse[chr] KeyError: 'chr1'

shawnzhangyx commented 7 years ago

Hi @firatuyulur Thanks for sending me the file formats. I think I found the problem. PePr does not read pileup BED files, it only process BED file (6+) which has raw reads. And in order to estimate the shift size, it needs to have strand information (which is often in column 6). Does the dh1f_1.bed file contains the raw reads? Does it have reads from both "+" and "-" strand? If the bed files you have are processed after merging paired-end bam files, I would suggest going with the -s 0 option to skip the shift size estimation.

ATpoint commented 5 years ago

Is there any progress on this issue, calling replicate samples without input controls?

shawnzhangyx commented 5 years ago

sorry I have been super busy wrapping up two other projects so there is no progress at the moment, and I probably won't be able to get to it for the next 6 months either.

Meanwhile, I would recommend running MACS2 to call peaks for ATACseq/DNase-seq.