Doubts about the selection of reference sequences for direct RNA sequencing

If a majority of reads do not span a splice junction then a good rate of re-squiggle can be achieved. There are many reasons that this could occur including a large fraction of un-spliced (pre-mRNA) reads or a reference sample with few splice junctions.

In any case, if tombo attempts to process a mapping over a splice junction, then tombo will assign signal to all bases within the intron. This will likely result in a very poor signal to sequence mapping, and may throw the assignment off well into the next exon (or for the rest of the read).

It is possible that many mappings are only to one side of the splicing junction (while the remainder of the read is trimmed and discarded). This would result is a large amount of wasted data. Additionally, if the mapping starts too far into a read the re-squiggle step is less likely to complete successfully.

These are the main reasons that it is recommended to use a transcriptome reference for RNA reads.

nanoporetech / tombo

Doubts about the selection of reference sequences for direct RNA sequencing #185