nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
477 stars 59 forks source link

c10::Error on simplex basecalling #658

Closed cocvincent closed 5 months ago

cocvincent commented 6 months ago

Hello,

I am currently experiencing a problem with simplex basecalling.

I'm trying to basecall with the dna_r10.4.1_e8.2_400bps_sup@v4.3.0 model.

fonction_erreur_29-02-24

After 23 hours of processing, I get the error c10::error

Code_erreur_29-02-24

I also have a "bam" file in my output folder, but I'm not sure it's complete.

I'm using 100G of memory with A100/V100 GPU on a computing cluster.

I hope you can help me, thanks.

vellamike commented 6 months ago

The error suggests that you are running out of host (CPU) memory - are you certain that you are running on a device with 100G of memory available (this should be enough)? Also, does this work for you without multiplexing?

cocvincent commented 6 months ago

Hello, thank you for your reply.

I'm sure I'm using 100g of RAM. I could see that this threshold had been exceeded. qstat-fx-29-02-24

I tried with 160g of RAM (the maximum I can allocate). Same problem. qstat-fx-01-03-24 Code_erreur_01-03-24

I'm currently testing without multiplexing, waiting for the results.

Is it possible to reduce the amount of RAM used by Dorado knowing that Guppy doesn't encounter this type of problem?

Thanks.

cocvincent commented 6 months ago

Hello, I had the same problem without multiplexing qstat-fx-04-03-24 Code_erreur_04-03-24

cocvincent commented 6 months ago

Hello,

The basecalling worked surprisingly well by splitting the dataset (pod5 list) in 2 parts to reduce the memory.

Thanks you for your help.

HalfPhoton commented 5 months ago

Thanks for the update @cocvincent. I'll close this ticket as resolved and we'll continue to work on improving the stability of Dorado