nanoporetech / bonito

A PyTorch Basecaller for Oxford Nanopore Reads
https://nanoporetech.com/
Other
395 stars 121 forks source link

`bonito basecaller` running out of (CPU) memory with POD5 - How to optimize memory consumption? #325

Closed sklages closed 1 year ago

sklages commented 1 year ago

Hi,

I use a small PROM dataset for setting up bonito. Appr. 177GB in size.

The basic command looks like this:

bonito \
  basecaller \
  dna_r9.4.1_e8_hac@v3.3 \
  /path/to/fast5_dir \
  --modified-bases 5mC \
  --reference $HS_HG19_mmi \
  --alignment-threads 24 \
  | samtools sort \
  -m 2G \
  --threads 8 \
  -O BAM \
  -o ${SAMPLE_ID}.srt.bam \
  --write-index \
  --reference $HS_HG19_FSA -

Server

Dell PowerEdge R7525
32x AMD EPYC 7F32
384GB RAM

GPU: NVIDIA A100-PCIE-40GB

NVIDIA-SMI 510.60.02
Driver Version: 510.60.02
CUDA Version: 11.6

fast5 input

Using fast5 finishes that run using appr 192GB RAM.

pod5 input

Using pod5 is using more than 300GB RAM. It will be killed on most of our A100 servers, as these have only 384GB RAM.

As this is a small dataset I wonder how I can successfully run a "normal" or "large" dataset?

Which parameters need to be modified/optimizied? Are there some "rule-thumb"s for optimizing memory parameters for both CPU and GPU memory issues?

Reducing the alignment threads to e.g. 8 does not help.

Is there a detailed description of the bonito command line parameters or some kind of "best practices"? This would probably also be helpful for #324 ...

sklages commented 1 year ago

With this dataset using fast5 RAM consumption is appr 224G, without samtools pipe 20G less.

We (also) use partitioned Nvidia A100 on 400G-RAM servers. So when I have three 10G partitions on a single A100 on a dedicated server I run into trouble. Not GPU-wise, but with the second job starting on such a machine, we will run out of memory.

@iiSeymour any hints that would reduce RAM consumption? Recommendations?

sklages commented 1 year ago

Omitting alignment mode reduces CPU memory usage significantly.