nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
495 stars 59 forks source link

lots of mutations found in direct RNA sequencing and dorado basecaller #707

Closed yul96 closed 6 months ago

yul96 commented 6 months ago

Issue Report

Please describe the issue:

We found a lot of mutations in the base called RNA reads using Dorado 0.5.3, the RNA is the product of in vitro transcription and the reads are mapped back to the template to evaluate mutations. There are a lot of non-random mutations in the reads. Is this expected for the current Dorado basecaller?

We ran six independent samples and these mutations are not expected.

Steps to reproduce the issue:

mRNA is produced by in vitro T7 transcription

direct RNA sequencing is used to generate the raw data according to the manual (SQK-RNA004)

dorado 0.5.3 is used to do the base call, command as below:

...../dorado-0.5.3-linux-x64/bin/dorado basecaller --estimate-poly-a --verbose ...../dorado-0.5.3-linux-x64/bin/rna004_130bps_sup@v3.0.1 pod5 > dorado.calls.sam

Run environment:

Logs

VBHerrenC commented 6 months ago

Is it possible that you are using some kind of modified base in your IVT? Some modified bases are often miscalled as C's.

yul96 commented 6 months ago

@VBHerrenC I think you are right, I spoke to the team and they used modified bases. thanks

HalfPhoton commented 6 months ago

Thanks for contributing to this issue @VBHerrenC.