nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
489 stars 59 forks source link

Missing bases #83

Closed tnn111 closed 1 year ago

tnn111 commented 1 year ago

I have a P2 Solo. This past week I used it for the first time and I got what I thought were really good results as evidenced by the following excerpt from the final report:

image

132.21 Gbases, 10.47M reads and an N50 of 19.46 Kb looked really good. Then I base called it using dorado. The command was

dorado basecaller dna_r10.4.1_e8.2_400bps_hac@v4.0.0 pod5 > bases.sam

followed by:

samtools fastq bases.sam > bases.fastq

Then I took a look at the bases.fastq file and found only 81.8 Gb. The number of reads was still 10.47M.

Can anyone tell me what happened to the ~50 Gb that's missing from this equation? Is it duplex? The kit says duplex enabled:

image

The host for the run was an Apple Studio with the following software:

image

The basecalling was done on a Linux system with an A100.

Thanks for any help.

Torben

tnn111 commented 1 year ago

I figured out the likely cause. It's the MinKNOW UI software using 400 bps while the P2 solo somehow uses 260 bps. Not good. I think this ruined at least 4 runs :-(. No indications of error anywhere.

vellamike commented 1 year ago

Hi Torben,

I’m not sure I understand, did Minknow indicate 400bps but ran at 260bps? In what sense are the runs ruined? Apologies if you’ve encountered a bug here - I’m trying to get some more detail so we can understand what’s gone wrong.

Thanks in advance, Mike

On Mon, 23 Jan 2023 at 21:44, Torben Nielsen @.***> wrote:

I figured out the likely cause. It's the MinKNOW UI software using 400 bps while the P2 solo somehow uses 260 bps. Not good. I think this ruined at least 4 runs :-(. No indications of error anywhere.

— Reply to this email directly, view it on GitHub https://github.com/nanoporetech/dorado/issues/83#issuecomment-1401031794, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALYB7J2SKA5WTHL4KHEWKLWT33SZANCNFSM6AAAAAAUDONNYM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

tnn111 commented 1 year ago

Hi Mike,

When you start a run using the MinKNOW UI, you have to decide between 260 bps and 400 bps with the former being more accurate and the latter giving you more throughput. We are sequencing metagenomes and throughput matters more than the last bit of accuracy. Anything that cuts our throughput is a “loss” to us.

I know we chose 400 bps. I also know that according to the UI, we got ~130 Gbases and close to 100 reads that were 2.5-3.0 Mbases long. That was exciting for a little while till the bubble burst :-)

But when I did the basecalling - using dorado - there were only 83 Gbases or so. At first I thought something was wrong with dorado or that it was duplex reads being collapsed or a few other weird things. But then I looked at the number of reads and that was invariant between what the UI showed and what came out of basecalling. At that point, I ran out of ideas until I realized that 83/130 ~ 260/400 and that everything made sense if I assumed that the P2 Solo was actually going at 260 bps while the MinKNOW UI assumed 400 bps. That would imply that either something in the P2 Solo is broken or there’s a bug in the Apple version of the MinKNOW UI. I have no way of telling which. If you look at the code, I think you will find that the MinKNOW UI uses the 260/400 values to estimate how many bases were sequenced…..

I also carried out a metagenomic flye assembly followed by a cmsearch looking for the ribosomal operons. This is something I do a lot to see what’s actually there and there were major anomalies in the result indicating that the basecalling was extremely poor. Flye also reported high error rates and the assembly was really bad.

I just did another assembly on a portion of the SAM file that I got from using dorado with a 260bps model and the results look much more normal. It’s too early to be completely sure, but I’ve done a lot of these and it looks good to me.

I’m almost sure it’s my P2 Solo that has the problem. It’s likely a hardware problem and odds are that it’ll need to be replaced.

I apologize for “blaming” dorado; I didn’t mean it that way. I just didn’t understand what was going on. Now I think I do.

Thanks, Torben

On Jan 23, 2023, at 14:24, Mike Vella @.***> wrote:

Hi Torben,

I’m not sure I understand, did Minknow indicate 400bps but ran at 260bps? In what sense are the runs ruined? Apologies if you’ve encountered a bug here - I’m trying to get some more detail so we can understand what’s gone wrong.

Thanks in advance, Mike

On Mon, 23 Jan 2023 at 21:44, Torben Nielsen @.***> wrote:

I figured out the likely cause. It's the MinKNOW UI software using 400 bps while the P2 solo somehow uses 260 bps. Not good. I think this ruined at least 4 runs :-(. No indications of error anywhere.

— Reply to this email directly, view it on GitHub https://github.com/nanoporetech/dorado/issues/83#issuecomment-1401031794, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALYB7J2SKA5WTHL4KHEWKLWT33SZANCNFSM6AAAAAAUDONNYM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/nanoporetech/dorado/issues/83#issuecomment-1401089694, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMXPRVHOWDZ3L4RLESWIX3WT4AKNANCNFSM6AAAAAAUDONNYM. You are receiving this because you authored the thread.

kieranhejmadi commented 1 year ago

Good afternoon Torben,

Apologies for the delay and your poor first experience with the P2 Solo.

We're currently investigating translocation speed for the P2 Solo but the median of ~250 bps run on your device looks to be far below what the majority customers are experiencing. Since we are able to identify from telemetry most customer runs fall within the accepted range when set at the 400bps condition. This significantly low translocation speed will result in a lower Q score on the 400bps model. This issue is contained to the P2 Solo itself rather than MinKNOW or Dorado and we are currently running a suite of sequencing tests to isolate and resolve this bug. It's unclear at this point the timeline for a fix or patch.

I'll reach out to you via email for device logs if you're happy to provide them. We'll also be able to discuss replacing your device in additional to replacement kits and flow cells.

I'll send you an email later today.

Thanks, Kieran

tnn111 commented 1 year ago

Hi Kieran,

It’s good that I’m not imagining things :-)

You can reach me at @. @.>.

Thanks, Torben

On Jan 26, 2023, at 06:54, kieranhejmadi @.***> wrote:

Good afternoon Torben,

Apologies for the delay and your poor first experience with the P2 Solo.

We're currently investigating translocation speed for the P2 Solo but the median of ~250 bps run on your device looks to be far below what the majority customers are experiencing. Since we are able to identify from telemetry most customer runs fall within the accepted range when set at the 400bps condition. This significantly low translocation speed will result in a lower Q score on the 400bps model. This issue is contained to the P2 Solo itself rather than MinKNOW or Dorado and we are currently running a suite of sequencing tests to isolate and resolve this bug. It's unclear at this point the timeline for a fix or patch.

I'll reach out to you via email for device logs if you're happy to provide them. We'll also be able to discuss replacing your device in additional to replacement kits and flow cells.

I'll send you an email later today.

Thanks, Kieran

— Reply to this email directly, view it on GitHub https://github.com/nanoporetech/dorado/issues/83#issuecomment-1405130078, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABMXPRU2IMXZDLTASF572BLWUKFZJANCNFSM6AAAAAAUDONNYM. You are receiving this because you authored the thread.

vellamike commented 1 year ago

Closing this issue as it is not a Dorado issue and will be handled separately by Oxford Nanopore support.