nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
491 stars 59 forks source link

barcoding creates empty files. Dorado 0.7.1 #900

Closed vilnis01 closed 3 months ago

vilnis01 commented 3 months ago

Issue Report

On June 19 I started a basecalling & demux job on a computing cluster. I have performed exactly the same task a day before using dorado 0.5.0. I wanted to compare the results just because curiosity but it turned out that, without changing other parameters except the version of dorado, the result using 0.7.1 is just empty files. After that I changed the version parameter back and it worked as before. I have seen the bug report #860 and I think that it's a related issue.

Dorado version: 0.7.1+80da5f5 Commands:

    # Perform basecalling
    dorado basecaller hac "$file" --device cuda:0 --kit-name SQK-NBD114-96 > "$basecalling_output_dir/${base_name}.bam"

    # Perform demuxing
    dorado demux --kit-name SQK-NBD114-96 --output-dir "$demux_output_dir" "$basecalling_output_dir/${base_name}.bam"

Hardware (1 cpu and 1 gpu used): 2 x Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz (in total 32 cores), 4 x NVIDIA Tesla V100 GPU per node, 16 GB HBM2, 5120 CUDA cores, 192 GB DDR4 2666 MHz ECC, 240 GB SSD, Infiniband EDR 100 Gb/s.

Source data: pod5 (doesn't matter, large or small) average read length about 1 kb

piece of error message:

[2024-06-19 19:41:09.501] [info] Running: "basecaller" "hac" "/mnt/beegfs2/home/vilnis01/margarita2/pod5//FAY95523_6c8a5c96_dc9a2eb7_0.pod5" "--device" "cuda:0" "--kit-name" "SQK-NBD114-96"
[2024-06-19 19:41:10.007] [info] Assuming cert location is /etc/ssl/certs/ca-bundle.crt
[2024-06-19 19:41:10.013] [info]  - downloading dna_r10.4.1_e8.2_400bps_hac@v5.0.0 with httplib
[2024-06-19 19:41:12.020] [info] Normalised: chunksize 10000 -> 9996
[2024-06-19 19:41:12.020] [info] Normalised: overlap 500 -> 498
[2024-06-19 19:41:12.020] [info] > Creating basecall pipeline
/var/spool/torque/mom_priv/jobs/4147894.rudens.SC: line 24: 119853 Killed                  dorado basecaller hac "$file" --device cuda:0 --kit-name SQK-NBD114-96 > "$basecalling_output_dir/${base_name}.bam"
[2024-06-19 19:41:37.678] [info] Running: "demux" "--kit-name" "SQK-NBD114-96" "--output-dir" "/mnt/beegfs2/home/vilnis01/margarita2/demux_output/FAY95523_6c8a5c96_dc9a2eb7_0" "/mnt/beegfs2/home/vilnis01/margarita2/basecalling_output/FAY95523_6c8a5c96_dc9a2eb7_0/FAY95523_6c8a5c96_dc9a2eb7_0.bam"
[2024-06-19 19:41:38.798] [info] Running: "basecaller" "hac" "/mnt/beegfs2/home/vilnis01/margarita2/pod5//FAY95523_6c8a5c96_dc9a2eb7_10.pod5" "--device" "cuda:0" "--kit-name" "SQK-NBD114-96"
[2024-06-19 19:41:39.600] [info] Assuming cert location is /etc/ssl/certs/ca-bundle.crt
[2024-06-19 19:41:39.606] [info]  - downloading dna_r10.4.1_e8.2_400bps_hac@v5.0.0 with httplib
[2024-06-19 19:41:40.716] [info] Normalised: chunksize 10000 -> 9996
[2024-06-19 19:41:40.716] [info] Normalised: overlap 500 -> 498
[2024-06-19 19:41:40.716] [info] > Creating basecall pipeline

Best regards, Vilnis

vilnis01 commented 3 months ago

Preliminary testing indicates that this is, in fact, a memory insufficiency, resolved by specifying pmem=4gb in general parameters