nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
477 stars 59 forks source link

Pipeline "Killed" #358

Closed Aimepicornell closed 1 year ago

Aimepicornell commented 1 year ago

Hello, I installed dorado by dragging the bin and lib folder into the bin and lib folders of my VM (Ubuntu). When i type in the following comand that happens:

dorado basecaller -x "cpu" rna002_70bps_hac@v3/ fast5_skip/ > calls.bam [2023-09-04 15:06:43.276] [info] > Creating basecall pipeline Killed


fast5_skip contains my fast5 files and calls.bam is the destination file that I want it to create.

Do you have an idea how i can resolve this problem? Do you need any more additional informations?

Thank you in Advance

Aimé

sklages commented 1 year ago

What is your host system? Why are you using a VM for basecalling? If you are running a Windows or MacOS system, just use it :-)

Aimepicornell commented 1 year ago

Hey, thanks for your reply! We use an online cloud based linux computer so we can work on our projects from everywhere :) unfortunately the basecalling on our windows lab laptop just takes ages because of its low specs! I'll send some more information tomorrow

sklages commented 1 year ago

ah, okay .. so maybe EPi2ME is of interest for you:

tijyojwad commented 1 year ago

A Killed message is usually indicative of the host OS killing a process due to excessive memory usage. Running basecalling on cpu device can do that. I would also suggest -

  1. increasing your VM memory specs
  2. lowering batch size (using the -b option)

But CPU basecalling will be orders of magnitude slower compared to using a GPU. If you're using a cloud VM, getting a machine with a GPU (or multiple GPUs) would be best.

Aimepicornell commented 1 year ago

Hello, I will look if i can change to an Multi GPU VM. Unfortunately changing the command to dorado basecaller -x "cpu" -b 64 rna002_70bps_hac@v3/ fast5_skip/ > calls.bam

doesnt seem to have an effect. It just gets killed again! As i understand it -b 64 is the smallest allowed number?

Our VM has these specs when typing in lscpu (And we have 64Gb of RAM)

Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 48 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 28 On-line CPU(s) list: 0-27 Vendor ID: AuthenticAMD Model name: AMD EPYC Processor (with IBPB) CPU family: 23 Model: 1 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 28 Stepping: 2 BogoMIPS: 3992.49 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mc a cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx m mxext fxsr_opt pdpe1gb rdtscp lm rep_good nopl cpuid ex td_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 s se4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdr and hypervisor lahf_lm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveo pt xsavec xgetbv1 arat Virtualization features: Hypervisor vendor: KVM Virtualization type: full Caches (sum of all):
L1d: 896 KiB (28 instances) L1i: 1.8 MiB (28 instances) L2: 14 MiB (28 instances) L3: 224 MiB (28 instances) NUMA:
NUMA node(s): 1 NUMA node0 CPU(s): 0-27 Vulnerabilities:
Itlb multihit: Not affected L1tf: Not affected Mds: Not affected Meltdown: Not affected Mmio stale data: Not affected Retbleed: Mitigation; untrained return thunk; SMT disabled Spec store bypass: Vulnerable Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Spectre v2: Mitigation; Retpolines, IBPB conditional, STIBP disable d, RSB filling, PBRSB-eIBRS Not affected Srbds: Not affected Tsx async abort: Not affected

Thank you very much to everybody in advance!

tijyojwad commented 1 year ago

You're correct - 64 is the lower batch size allowed.

64GB RAM is not small, but it may not be large enough. You could try running with a fast model instead of hac but that would give you lower basecalling accuracy.

If you're able to request your own VM, then I would suggest a system with 1 or 2 A100 (40 or 80GB) GPUs, 32 or 48 CPU cores, and at least 256GB of RAM.

Aimepicornell commented 1 year ago

Hey i just tried using the fast basecaller, which seems to work. Just like you said i can see the CPU and Ram rapidly go towards 100% and then the process is being killed. Even though the fast basecalling is not optimal for me, at least it is something i can start to work with. I will see if i can add a A100 to our VM and increase performance to allow for high accuracy basecalling.

Thank you for your input and all your Work!

Greetings Aimé