jsh58 / Genrich

Detecting sites of genomic enrichment
MIT License
182 stars 27 forks source link

Peak Calling with a Control or without #74

Closed lilian18100 closed 1 year ago

lilian18100 commented 3 years ago

Hello @jsh58,

Thank you very much for developing Genrich.

I am currently using it to perform peak-calling on Atac-seq Data. I use as input bam files generated by STAR, reporting uniquely mapped read only.

I have 2 treatments (treated or non-treated with active molecule), with 3 biological replicates at 2 time points.

I plan on peak calling them separately to let Difbind make the differential analysis.

I have a fittable control for my Atac-seq Data coming from a previous ChiPseq experiment (sonicated DNA from the same biological material). However, it was sequenced single end (75bp). While my Atac seq data were paired-end (75bp). Moreover, the bam file from single end Chip-seq control has 10 million reads whereas the paired-end ones around 50 million.

So I was wondering if you could help me with several questions I could not solve:

genrich -t treated1.bam -c sonicated_single-end_input.bam -o treated1.narrowPeak -a 50 -r -j

genrich -t treated1.bam,treated2.bam -o treated1-2.narrowPeak -a 50 -r -j

-Do you think that I can be confident about the peak-calling output for unique sample as treatment without control, when the aim is to compare several samples afterward? like :

genrich -t treated1.bam -o treated1-2.narrowPeak -a 50 -r -j

Thank you very much for your time & help !

jsh58 commented 3 years ago

Thanks for the questions. This is similar to #19, so you should refer to that issue, as well as my answers below.

Can I use the Chip-seq as a control for peak calling with Genrich, knowing that treatment is paired and Chip-seq control single end?

I do not think this is a good idea. The interpretation of alignments in ATAC-seq mode is the same for both treatment and control samples, and very different from the regular analysis mode (e.g. for ChIP-seq).

Or it is better not to use the control, but perform the peak calling with Genrich with several replicates together?

Yes, especially since this takes advantage of Genrich's ability to analyze multiple replicates collectively.

Do you think that performing peak calling without sonicated input control makes any sense at all?

Yes this makes sense. As I've written before, control samples are not typically run with ATAC-seq.

The purpose of a control when running Genrich is to reduce erroneous peaks that may appears due to (for example) genomic regions that sequence very easily. In your case, you have an experimental control ("non-treated with active molecule"), and such erroneous peaks would show up in both these samples and the "treated with active molecule" samples.