jsh58 / Genrich

Detecting sites of genomic enrichment
MIT License
182 stars 27 forks source link

Removing PCR duplications gets more peaks ? #101

Closed akhst7 closed 1 year ago

akhst7 commented 1 year ago

Hi, I run genrich with or without -r option as follows;

!/opt/homebrew/bin/zsh

Genrich -t /Volumes/MySlateDrive/snatac_snRNAseq_pbmc3k/atac_Solo.out/sortbyname.atac_SoloAligned.bam \ -E /Volumes/MySlateDrive/blacklist/black.list.hg38.bed \ -o /Volumes/MySlateDrive/peaks.pbmc3k/generich/peak.generich.bed \ -j \ -v \ -r \ It turns out that a -r option gives more peaks than without -r option as follows;

wc -l peak.generich.bed                                                                              [~]
   56128 peak.generich.bed
wc -l peak.generich.nodup.bed                                                                              [~]
   57234 peak.generich.nodup.bed

Why does removing PCR duplicates increase the number of peaks ? I thought it was an opposite way around.

jsh58 commented 1 year ago

There is not a simple relationship between removing PCR duplicates and the number of peaks called. In general, it is advisable to remove PCR duplicates, unless there is a concrete reason to believe that reads are being falsely identified as duplicates.

For additional discussion of the relationship between numbers of reads and peaks, see #11 and #33.

akhst7 commented 1 year ago

OK I will dig a bit deeper