Low mapping ratio and lost reads after realigngap and filter

y9c / pseudoU-BIDseq

🧪 New pipeline for detecting pseudouridine modification on RNA (BID-seq, etc)

https://bidseq.chuan.science/

GNU General Public License v3.0

14 stars 4 forks source link

Low mapping ratio and lost reads after realigngap and filter #3

Closed erhei01 closed 1 year ago

erhei01 commented 1 year ago

Hi y9c! I am trying your pseudoU-BIDseq pipeline. I found your workflow is very efficient and your coding is perfect.Nice work! I have a question here. when I ran the pipeline with the mRNA samples from Quantitative sequencing using BID-seq uncovers abundant pseudouridines in mammalian mRNA at base resolution, I found the mapping ratio for genome is very low(~30% for unique mapping and ~6 % for multimapping), and I have lost lots of reads after realignGap and samtools calmd, samtools view -e '[NM]<=5 && [NM]/(qlen-sclen)<=0.1', the average of lost reads is about 40%. I wonder is this correct and what can I do? Thanks you!

y9c commented 1 year ago

Thank you @erhei01. Low mapping ratio might caused by improperly adapter trimming. Could you show me which dataset you have tested and the adapter in your settings?

erhei01 commented 1 year ago

Thanks for your reply. I ran your Snakefile with default settings on SRR15082607~SRR15082610 of GSE179798 and I found the deduping and mapping records in workspace/report_reads directory.

y9c commented 1 year ago

The default settings of this repo do not fit the data you mentioned. This method is quite different from the previous paper. If you want to reproduce the results, I would recommend you to follow the method described in the paper.

This method can be adapted to the previous data by adding barcode: NNNNNXXX-XXXNNNNNATCACG in the yaml file, but the result is not guaranteed to be exactly the same. If you use the default settings, the inline barcode won't be trimmed and lead to low mapping ratio.

If you have any questions or suggestions, feel free to open new issues.