nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
534 stars 64 forks source link

[] std::bad_alloc and methylation model #1068

Closed Taylorain closed 1 month ago

Taylorain commented 1 month ago

[] std::bad_alloc

I'm trying to basecall my pod5 files on our linux remote server, but I keep getting the same std::bad_alloc error. The run starts but then shortly stops and all I get is an empty output bam file.

My scripts:

dorado basecaller \
  -x cuda:0 \
  --estimate-poly-a \
  --verbose \
  --emit-moves \
  --min-qscore 7 \
  --reference /home/reference/genomic.fna_revise \
  -k 15 \
 --modified-bases sup,pseU,m6A_DRACH,m6A,m5C,inosine_m6A \
  /bio/data_BC202406118-BN240605QD02S41N1/pod5/LM_0_1+2+3/pod5_pass/ > /bio/sup_drs/2.1_bam/0_A.filtered.bam

log

Sun Oct  6 05:11:02 PM CST 2024
[2024-10-06 17:11:02.988] [info] Running: "basecaller" "-x" "cuda:0" "--estimate-poly-a" "--verbose" "--emit-moves" "--min-qscore" "7" "--reference" "--reference /home/reference/genomic.fna_revise" "-k" "15" "--modified-bases" "sup,pseU,m6A_DRACH,m6A,m5C,inosine_m6A" "/bio/data_BC202406118-BN240605QD02S41N1/pod5/LM_0_1+2+3/pod5_pass/"
[*** LOG ERROR #0001 ***] [2024-10-06 17:11:03] [] std::bad_alloc
finished-1
Sun Oct  6 05:11:03 PM CST 2024

$ top %Cpu(s): 18.6 us, 0.7 sy, 0.0 ni, 63.4 id, 17.3 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 515496.8 total, 3848.1 free, 300004.7 used, 211644.0 buff/cache MiB Swap: 526336.0 total, 231547.3 free, 294788.7 used. 212131.0 avail Mem

$ free -h total used free shared buff/cache available Mem: 503Gi 293Gi 3.3Gi 15Mi 207Gi 207Gi Swap: 513Gi 287Gi 226Gi

methylation models

Moreover, I want to analyze methylation modifications using DRS data, and I’m considering the following parameters. Can I use these models at the same time? --modified-bases sup,pseU,m6A_DRACH,m6A,m5C,inosine_m6A My main focus is on m6A and m5C modifications, but I'm unclear about the differences between the three: m6A_DRACH, m6A, and inosine_m6A. If I have to choose one, would using m6A provide broader results that also encompass those of m6A_DRACH?

malton-ont commented 1 month ago

Hi @Taylorain,

This looks like the same issue as https://github.com/nanoporetech/dorado/issues/1039. Your --modified-bases parameter must be the last parameter in the command.

Dorado can run multiple modbases simultaneously, but they must all target modifications on different canonical bases. m6A, m6A_DRACH and inosine_m6A all target A mods, so you need to pick one. If you are not interested in inosine and are looking for coverage outside the DRACH motif, then the all-context m6A model is the one you want.

Also, your -k parameter needs to be passed to --mm2-opts.

dorado basecaller \
  -x cuda:0 \
  --estimate-poly-a \
  --verbose \
  --emit-moves \
  --min-qscore 7 \
  --reference /home/reference/genomic.fna_revise \
  --mm2-opts "-k 15" \
  /bio/data_BC202406118-BN240605QD02S41N1/pod5/LM_0_1+2+3/pod5_pass/ \
  --modified-bases sup,pseU,m6A,m5C \
> /bio/sup_drs/2.1_bam/0_A.filtered.bam
Taylorain commented 1 month ago

Hi, thanks for your reply, I tried the command mentioned above, but the same error persists. [*** LOG ERROR #0001 ***] [2024-10-07 16:29:11] [] std::bad_alloc

However, when I used the following code, it successfully generated the BAM files.

dorado basecaller \
-x cuda:0 \
--estimate-poly-a \
--verbose \
--emit-moves \
--mm2-opts "-k 15" \
--min-qscore 7 \
--reference /home/reference/genomic.fna_revise \
--modified-bases-models /home/ZHH/software/dorado/model/rna004_130bps_sup@v5.1.0_m5C@v1,/home/ZHH/software/dorado/model/rna004_130bps_sup@v5.1.0_m6A_DRACH@v1,/home/ZHH/software/dorado/model/rna004_130bps_sup@v5.1.0_pseU@v1 \
/home/ZHH/software/dorado/model/rna004_130bps_sup@v5.1.0 \
/bio/data_BC202406118-BN240605QD02S41N1/pod5/LM_0_1+2+3/pod5_pass/ \
> /bio/ZHH/sup_drs/2.1_bam/0_A.filtered.bam

Is this new code correct?

malton-ont commented 1 month ago

@Taylorain,

Apologies! That should have been:

dorado basecaller \
  -x cuda:0 \
  --estimate-poly-a \
  --verbose \
  --emit-moves \
  --min-qscore 7 \
  --reference /home/reference/genomic.fna_revise \
  --mm2-opts "-k 15" \
  sup,pseU,m6A_DRACH,m5C \
  /bio/data_BC202406118-BN240605QD02S41N1/pod5/LM_0_1+2+3/pod5_pass/ \
> /bio/sup_drs/2.1_bam/0_A.filtered.bam

i.e. the model complex should come before the data path, and not be part of the --modified-bases parameter.

Yes, your version should work as well.

Taylorain commented 1 month ago

Hi, I tried it again, but the same problem occurred.

[*** LOG ERROR #0001 ***] [2024-10-07 16:55:11] [] std::bad_alloc

So I decided to use the second script that was able to run successfully. Thank you for your help!

malton-ont commented 1 month ago

Hi @Taylorain,

Sorry, I made a typo. I've updated my comment to the correct version for future users. I've also identified the root cause of this issue and we'll push a fix for the next release.

Taylorain commented 1 month ago

I understand now, thank you!