Duplication in E.coli genome

yezhengSTAT / CUTTag_tutorial

Tutorial Website

https://yezhengstat.github.io/CUTTag_tutorial/

51 stars 17 forks source link

Duplication in E.coli genome #10

Open bio-lilin opened 8 months ago

bio-lilin commented 8 months ago

Hello, Thank you for creating this helpful tutorial. I have a question about the duplication in the E. coli genome. When I aligned to the E. coli spike-in genome, I found almost all mapped reads marked duplication. Should I use the number of fragments from the E. coli after removing duplication reads, or the number of all fragments from the E. coli when I calculate the scale factor to normalize my data? Thank you, and look forward to your reply. 2024-01-02 194001

yezhengSTAT commented 8 months ago

Usually, we do not recommend removing duplicates not only for the spike-in but also for the main genome of interest. However, if the spike-in quality is a concern to you (you may want to check other quality control metrics), you can use the sequencing depth normalization strategy, such as CPM.

Thanks, Ye

bio-lilin commented 8 months ago

Thanks for your reply!