rr1859 / R.4Cker

MIT License
16 stars 15 forks source link

Bowtie 2 mapping w/ reduced genome #33

Open aliu90 opened 7 years ago

aliu90 commented 7 years ago

Hello,

If the reduced genome is composed only of unique RE adjacent sequences, how is it possible that when I run bowtie 2 using the command that is provided in your manual, I get sequences that align >1 time? I would expect to get only sequences that align 0 times or exactly 1 time.

I only want unique sequences that are mapped. Am I misunderstanding something here? Is there a way to only get sequences that map exactly one time?

aliu90 commented 7 years ago

I'm not sure if this is the issue, but one thing I noticed in the script to create the reduced genome, is that it doesn't take into consideration complementary identical sequences. For example, there are several paralogous genes on the X chromosome, they are approximately 99% identical and I would expect them to be excluded in the reduced genome because of this. However, since they are on opposite strands, and thus their sequences are complementary, they are still included in the reduced genome. Could you add a few lines of script to fix this issue?

rr1859 commented 7 years ago

Since bowtie2 maps using seeds you can get reads that do not not map perfectly to multiple fragments. You need to perform an additional filtering to remove this. Do you have a large percentage of these reads? That's a good suggestion regarding chrX and Y. I can try to incorporate that but I am a bit swamped now so it may take a few weeks- sorry about that!