sketching large metagenomic files with GPU

Hi weihong,

I use it for sketching a metagenome file, 60 million reads, total 18G fasta files in the GPU mode but I found that for A100, 40G with a scale value 10 (that is sample much more kmers), I have the following error:

[jzhao399@atl1-1-02-018-25-0 Hyper-Gen]$ hyper-gen-GPU sketch --device gpu -p ./folder/ -o HM5M108_Hyper-Gen-GPU_s10 -s 10 2024-07-27-01:52:22 [INFO] - Start GPU sketching... thread 'main' panicked at src/sketch_cuda.rs:138:62: called Result::unwrap() on an Err value: DriverError(CUDA_ERROR_OUT_OF_MEMORY, "out of memory") note: run with RUST_BACKTRACE=1 environment variable to display a backtrace Aborted

It is very fast though with s>=50 or something. Is that because all samples kmers, each kmer will be a new dimension. Is there a way to streaming the input hyper dimensional vector or it has to be loaded into GPU memory entirely.

Thanks,

Jianshu

wh-xu / Hyper-Gen

sketching large metagenomic files with GPU #8