gxiaolab / L-GIREMI

L-GIREMI (Long-read Genome-independent Identification of RNA Editing by Mutual Information)
GNU General Public License v3.0
6 stars 1 forks source link

read orientation #1

Open chilampoon opened 1 year ago

chilampoon commented 1 year ago

Hi there, I came across your l-giremi paper and found it quite interesting. I am confused by a problem that is the read orientation - I don't quite understand why there would be reads that have the wrong orientation. Is it because in some regions there are genes on both + & - strands on the genome, so reads mapped to these regions might have 2 orientations? If I mapped the reads to genes first then I don't need to worry too much about if the orientation is correct or not?

Many thanks, Chilam

wolfsonliu commented 1 year ago

Hi Chilam,

The orientation or strand of the reads can affect the determination of mismatch types, which might lead to more noise. The wrong reads strand might come from the basecalling steps. Or, it may not be the error of the basecalling, but just the real strand for the molecules sequenced, but the molecules may not be what we want. In fact, most of the reads have the correct strand, it's safe just to ignore the correction for those reads. However, a very tiny number of reads can generate some mismatches that do not agree with other mismatches and complicated the downstream filtering and calculation. So, the read strand correction just try to reduce the noise of unwanted mismatch types.

When you use the current version of L-GIREMI, you don't need to worry about it, since it's applied automatically. In future version, there will be an option to determine whether the correction should be applied (default is applying the correction). So, you can just run the program.

Best,

Zhiheng