jsh58 / Genrich

Detecting sites of genomic enrichment
MIT License
182 stars 27 forks source link

Genrich with Cut & Tag samples? #92

Closed r-mashoodh closed 1 year ago

r-mashoodh commented 2 years ago

Hello,

I was wondering if its possible to use Genrich with Cut & Tag samples? The only difference should be the low background ... and I generally wouldn't remove duplicates because of cut/sites.

I made an attempt, treating it like Chip-Seq (-v -q 0.05 -y -f) but I get the following error, and lots of them, for both sample and control: Warning! Read A00627:255:HV7H3DSX2:4:1411:26603:34788, alignment at (NW_017095559.1, 111258-111378) skipped due to underflow And gave me ~1200 peaks.

I tried ATAC mode since they share Tn5 cut site properties (-j -v -q 0.05 -y -f) but got a lot of these: Warning! Read A00627:255:HV7H3DSX2:4:1278:11162:8797, alignment at (NW_017096586.1, 0-66) skipped due to overflow and gave me ~1200 peaks.

This is what I would do in macs2: macs2 callpeak -t file.bam -c IgG_merged.bam -n filename --bdg --gsize 195308655 --keep-dup all -q 0.1 which calls about ~10k peaks.

I have multiple groups/replicates so a Genrich approach is appealing.

Do you have any suggestions?

Thank you.

jsh58 commented 2 years ago

Thanks for the question. The underflow/overflow warnings are due to the fact that Genrich has a limit to how many fragments it can count at a specific site. Because Genrich uses 16-bit ints, the maximum is 32767. Even with this limitation, I cannot imagine that peaks are not being called in those genomic regions for which warnings are being generated. Of course, there would be no warnings if you used the option to remove PCR duplicates.

r-mashoodh commented 2 years ago

Thanks @jsh58 for your reply. When I try removing duplicates I get even fewer peaks ... around 200.

Cut and Tag tends to be low in background that makes peak calling hard. So is the answer to the original question is that Genrich can't really work for Cut and Tag?

jsh58 commented 2 years ago

I do not know why you are jumping to that conclusion. I believe I have addressed the concern about the underflow/overflow warnings. So is the concern about the number of peaks? Genrich offers many options for adjusting peak-calling parameters (see, for example, #4, #11, #33, #73, #75, #80, #83).

tdfair commented 1 year ago

Not sure which antibodies you're using, but Genrich works well for CUT&Tag with abundant histone PTMs H3K4me1, H3K4me3, H3K27ac, or H3K27me3 (~12k to ~150k pA-Tn5 regions, depending on AB). I use the settings -j -r -e chrM -q 0.05.