m1 neural engine - Githubissues

nanoporetech / dorado

Oxford Nanopore's Basecaller

https://nanoporetech.com/

Other

482 stars 59 forks source link

m1 neural engine #58

Open itslittman opened 1 year ago

itslittman commented 1 year ago

I can see Dorado using my M1 Max GPU, but does it also use the apple Neural Engine? I'm curious, as I'm not sure if there's any way to monitor usage of that feature while the program is running. I just finished basecalling a sample with the SUP RNA model, with a rate of "Samples/s: 1.396180e+05" ; is this normal? or can I increase the speed somehow?

iiSeymour commented 1 year ago

Hey @itslittman

No, dorado currently only uses the GPU on M1 and not the Neural Engine. asitop is pretty good for performance monitoring on M1. The sup models need a lot of memory so the best thing to do would be to close any other applications that are using significant memory.

HTH

Chris.

itslittman commented 1 year ago

Hi Chris, Thanks for the link! I’ll check that out. Might dorado use the ANE in the future/is there even a benefit to that?

Edit: Also, what is the % accuracy now with this model? I see a noticeable improvement when comparing HAC vs SUP next to each other in IGV.

Noah

AngusNano commented 1 year ago

Hi,

Just a followup to this question. I was wondering if there are any parameters that can be optimised to improve the performance of dorado on the M1 chip? I have an M1 Max with 64GB of RAM and get ~3.7e+04 Samples/s with the dna_r10.4.1_e8.2_400bps_sup@v4.0.0 model (macOS 13.0.1, dorado 0.1.1 using pod5 files). I noticed that while there is 100% GPU usage, it only allocates 13GB of RAM to the GPU and there is >30GB of RAM free in the system. None of the performance CPU cores are active during the base calling, only the efficiency cores, which run at about 40%.

Cheers,

Angus

itslittman commented 1 year ago

@AngusNano 40% of any of your CPU cores run? I have basically the same setup as you do but dorado appears to run exclusively on my GPU, with tiny spikes in CPU usage at sparse but regular intervals. Increasing the chunk size appears to increase memory usage, but I didn’t notice a big gain in performance, although maybe I didn’t play around with it enough.

Noah

AngusNano commented 1 year ago

I only see activity on the efficiency cores (both operating at ~40%). The performance cores aren't active. I see the same as you, dorado seems runs exclusively on the GPU.

I've compared the performance with dorado running on an RTX4090 (Ubuntu 20), and it gives ~7.2e+06 Samples/s. Guppy on the same machine gives ~6.9e+06 Samples/s, so there doesn't seem to be much of an advantage to dorado.

On a GTX1080Ti I get ~4.3e+05 Samples/s for guppy. This is all using the dna_r10.4.1_e8.2_400bps_sup model.

Cheers,

Angus