y9c / pseudoU-BIDseq

🧪 New pipeline for detecting pseudouridine modification on RNA (BID-seq, etc)
https://bidseq.chuan.science/
GNU General Public License v3.0
14 stars 4 forks source link

Duplicated reads level is high #11

Closed Jarvis559 closed 3 months ago

Jarvis559 commented 6 months ago

Hi, @y9c ! I have a question here. I use the BID-pipe for my data, and the duplication level is 40%. I also use 'seqkit rmdup' by the sequence to calculate the duplication level, it's only 20%. I want to know how the BID-pipe calculate the duplication level and what's the difference from the seqkit rmdup. Looking forward to your reply. Thanks a lot!

y9c commented 6 months ago

Hi @Jarvis559. It is known that deduplicate before mapping would overestimate the library complexity and this is why seqkit rmdup tend to report lower duplication level. However, this difference is too large in your case. Could you show more detail or upload some example data for the debug?