mhammell-laboratory / TEtranscripts

A package for including transposable elements in differential enrichment analysis of sequencing datasets.
http://hammelllab.labsites.cshl.edu/software/#TEtranscripts
GNU General Public License v3.0
217 stars 29 forks source link

How to deal with PCR duplication #129

Closed duzc-Repos closed 1 year ago

duzc-Repos commented 1 year ago

Hi, there is a question about PCR duplicate. The library was constructed following smart-seq2, without UMI. I wonder if it is reasonable to remove PCR duplication using Picard before TE quantification. Or should I do this step? Would you mind give me some advice for these questions? And are there any other software to remove PCR duplication for TE quantification?

olivertam commented 1 year ago

Hi,

Thank you for your interest in the software.

You can certainly remove PCR duplicates with Picard if you want. We typically don't remove PCR duplicates, as we can't be certain that they are actual duplicates versus highly expressed transcripts. For example, when we remove duplicates in a library, we got a significant drop in the counts for GAPDH and ACTB

Feature    RemoveDup    Original
"ACTB"    2097    31467
"GAPDH"     1797    43744

In contrast, TE counts do not appear to be dropping as significantly.

Feature    RemoveDup    Original
AluY:Alu:SINE   70405   76495
L1HS:L1:LINE    3569    3800
SVA_F:SVA:Retroposon    555     591

Thus, we're not confident (in the absence of UMI) that removing "PCR duplicates" works as well as in other applications. Please let me know if you have any questions.

Thanks.

duzc-Repos commented 1 year ago

Thanks for your advice. it totally solved my confusion. Thank you!