stevebaeyen commented 1 month ago

Issue Report

Please describe the issue:

ran dorado 0.7.0 basecaller on R10.4.1 simplex data with v5 models and tried to correct the reads using herro, but abort due to unsufficient memory

Steps to reproduce the issue:

dorado correct GBBC_502_supv5.fq > GBBC502_corr.fasta [2024-05-25 09:51:53.419] [info] Running: "correct" "GBBC_502_supv5.fq" [2024-05-25 09:51:53.420] [info] - downloading herro-v1 with httplib terminate called after throwing an instance of 'std::runtime_error' what(): Insufficient memory to run inference on cuda:0 Aborted (core dumped)

Run environment:

Dorado version: 0.7.0+71cc744+cu11080
Dorado command: dorado correct GBBC_502_supv5.fq > GBBC502_corr.fasta
Operating system: Ubuntu 20.04.6 LTS
Hardware (CPUs, Memory, GPUs): 8 CPU (i7 Intel), 16Gb RAM, GPU Nvidia GeForce RTX 2080 with 8Gb memory
Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance): pod5 -> fq after basecalling
Source data location (on device or networked drive - NFS, etc.): on device
Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB): 1.5 Gb .fq
Dataset to reproduce, if applicable (small subset of data to share as a pod5 to reproduce the issue):
Logs
Please provide output trace of dorado (run dorado with -v, or -vv on a small subset)

tijyojwad commented 1 month ago

Hi @stevebaeyen - running dorado correct is CPU/host memory and GPU memory intensive. As suggested here we recommend running on a beefier system to get reasonable performance.

You can run it on a smaller system by placing around with -b (batch size) and -i (mapping index size) through the cmdline. e.g. you can try -i 800M -b 2 and see if that works.

stevebaeyen commented 1 month ago

Thanks @tijyojwad ! That is working!

nanoporetech / dorado

Dorado correct terminates due to unsufficient memory #843

Issue Report

Please describe the issue:

Steps to reproduce the issue:

Run environment:

Logs