masai1116 / SHARE-seq-alignment

pipeline for demultiplex and align both ATAC and RNA data generated in SHARE-seq
15 stars 9 forks source link

For RNA-seq R2 reads, what if we didn't filter reads by polyT? #3

Open YichaoOU opened 3 years ago

YichaoOU commented 3 years ago

Hello @masai1116,

Thank you again for sharing the code! I have some questions regarding RNA-seq reads.

In our test SHARE-seq data, if we filter R2 by polyT, TTTTTT + 1 mismatch, about 50% reads will be discarded. Is it normal?

What happens if we didn't filter reads by polyT? Because for bulk RNA-seq and 10x scRNA-seq, we didn't filter polyT, right?

One last quesiton, in your STAR alignment code, you only used R1. I found my mapping rate for using only R1 is just 50%, but with paired-end mapping, I can get 70%. Is it normal?

Thanks, Yichao

masai1116 commented 3 years ago

polyT filtering is usually not necessary. The percentage of reads has polyT varies case by case though. I haven't quantified it. I should look into it.

When sequencing by next-seq 150 cycle kit, we only have 30bp on Read2, which includes 10bp UMI, 15bp TTTT, and only 5bp mRNA reads. 5bp is too short to be properly aligned. So I only used R1 to align. When using Nova-seq, we could get longer R2 and that could help alignment as you saw. To keep it simple, I only aligned R1 in the script.