jsh58 / Genrich

Detecting sites of genomic enrichment
MIT License
185 stars 27 forks source link

Unexpected replicate and compare outputs #19

Closed yhouvras closed 5 years ago

yhouvras commented 5 years ago

I see unexpected peaks when I submit replicates and when I compare experimental to control samples. This is Atac-seq data. Maybe I am doing something wrong, but it would be great to have some input, as the Genrich output is puzzling.

I ran 3 control biological replicates (H33_1, H33_2, H33_3) and 3 experimental biological replicates (K9M_1, K9M_2, K9M_3). Here is the process I followed:

1) Peak calls for each indvidual sample were performed by running:

Genrich -t sample -o sample.narrowPeak -j -e chrM -y -d 150 -v

I visualized the peaks and the .bam in IGV and they look great! Peaks are exactly where you would expect them for each file.

2) I ran Genrich asking it to compare the replicates in experiment to control. I ran it both ways, so:

Genrich -t K9M_1_sorted.bam,K9M_2_sorted.bam,K9M_3_sorted.bam -c H33_1_sorted.bam,H33_2_sorted.bam,H33_3_sorted.bam -o compare.narrowPeak -j -e chrM -y -d 150 -v

Genrich -c K9M_1_sorted.bam,K9M_2_sorted.bam,K9M_3_sorted.bam -t H33_1_sorted.bam,H33_2_sorted.bam,H33_3_sorted.bam -o revcompare.narrowPeak -j -e chrM -y -d 150 -v

The output here is puzzling and doesn't look consistent right. In some places it does call peaks that are differential. In many others it does not -- see screenshot1

screenshot1

So to explore this further I ran:

3) Used each condition alone and asked Genrich to call peaks from biological replicates:

Genrich -t K9M_1_sorted.bam,K9M_2_sorted.bam,K9M_3_sorted.bam -o k9monlycompare.narrowPeak -j -e chrM -y -d 150 -v

Genrich -t H33_1_sorted.bam,H33_2_sorted.bam,H33_3_sorted.bam -o h33onlycompare.narrowPeak -j -e chrM -y -d 150 -v

The output here is confusing, as Genrich finds peaks from the replicates that are missing from individual samples -- see screenshot #2

screenshot2

Maybe my commands are not right? I did try altering p-value thresholds, removing pcr duplicates, but didn't see these parameters changing.

jsh58 commented 5 years ago

Thanks for the detailed questions!

Based on the Genrich commands you gave, it seems that we may be getting confused on terminology, specifically experimental and control. When I refer to a control sample, I mean one that is not exposed to the assay (e.g. ATAC or ChIP), but otherwise matches an experimental sample and is sequenced. For example, a control sample with ChIP-seq would be the input DNA (not treated with an antibody). Control samples are not typically done with ATAC-seq.

From your commands, you sometimes use the control samples as the "experimental" input to Genrich. Unless this is a terminology issue, I do not understand the purpose of that at all.

yhouvras commented 5 years ago

Thanks for your feedback.

My use of control and experimental (above) is intended to identify differential peaks between 2 conditions. Can Genrich do that for Atac-seq data? Or, are you suggesting that a sample vs. sample comparison can not be performed with Genrich?

jsh58 commented 5 years ago

Genrich does not perform differential peak calling. A differential peak caller (e.g. DiffBind) can be applied to the peaks called by Genrich. In your case, I would recommend using the peaks from your 3rd example above (analyzing multiple replicates together).

The output here is confusing, as Genrich finds peaks from the replicates that are missing from individual samples

This is addressed (in general) in the README and in #14. With your samples, a log file (-f) should help to clarify why peaks are called.