nanoporetech / tombo

Tombo is a suite of tools primarily for the identification of modified nucleotides from raw nanopore sequencing data.
Other
231 stars 54 forks source link

Doubts about the selection of reference sequences for direct RNA sequencing #185

Closed weir12 closed 5 years ago

weir12 commented 5 years ago

Hi: I I understand processing RNA data requires a transcriptome due to the lack of spliced mapping support within the Tombo. However, I used the genome for re-squiggle and it also achieved good re-squiggle rate.I don't quite understand the necessity of using a transcriptome, although I believe there is a reason for doing this. Please help me solve this puzzle.

marcus1487 commented 5 years ago

If a majority of reads do not span a splice junction then a good rate of re-squiggle can be achieved. There are many reasons that this could occur including a large fraction of un-spliced (pre-mRNA) reads or a reference sample with few splice junctions.

In any case, if tombo attempts to process a mapping over a splice junction, then tombo will assign signal to all bases within the intron. This will likely result in a very poor signal to sequence mapping, and may throw the assignment off well into the next exon (or for the rest of the read).

It is possible that many mappings are only to one side of the splicing junction (while the remainder of the read is trimmed and discarded). This would result is a large amount of wasted data. Additionally, if the mapping starts too far into a read the re-squiggle step is less likely to complete successfully.

These are the main reasons that it is recommended to use a transcriptome reference for RNA reads.