JhinJhinJhin commented 1 month ago

Issue Report

[error] Too few arguments

Please describe the issue:

[2024-08-05 21:02:10.576] [info] Running: "basecaller" "-x" "cpu" "--modified-bases" "m6A" "--mm2-preset" "splice" "-k14" "--no-trim" "/lustre/home/jianghao/workspace/01.RNA-seq/DM_bulk_ONT/output/dorado/rna004_130bps_sup@v5.0.0/" "/lustre/home/jianghao/database/05_RNA/01_DM_ONT/pod5/LDL-1/pod5_pass_1/"

I really can't find what went wrong, The model cannot be downloaded after submitting the task using pbs, so I can only specify the patho existing model directory. Can someone show me example code or tell me how to solve it? I really appreciate it.

Please provide a clear and concise description of the issue you are seeing and the result you expect.

Steps to reproduce the issue:

dorado basecaller \ -x cpu \ --modified-bases m6A \ --mm2-preset splice -k14 \ --no-trim \ /lustre/home/jianghao/workspace/01.RNA-seq/DM_bulk_ONT/output/dorado/rna004_130bps_sup@v5.0.0/ \ /lustre/home/jianghao/database/05_RNA/01_DM_ONT/pod5/LDL-1/pod5_pass_1/ > LDL-1.bam

Please list any steps to reproduce the issue.

Run environment:

Dorado version:0.7.2+9ac85c6
Dorado command:basecaller
Operating system:linux-3.10.0
Hardware (CPUs, Memory, GPUs):CPU
Source data type (e.g., pod5 or fast5 - please note we always recommend converting to pod5 for optimal basecalling performance):pod5
Source data location (on device or networked drive - NFS, etc.): lustre
Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB):
194G /lustre/home/jianghao/database/05_RNA/01_DM_ONT/pod5/LDL-1/pod5_pass_1/pod5-pass_1.pod5 412G /lustre/home/jianghao/database/05_RNA/01_DM_ONT/pod5/LDL-1/pod5_pass_1/pod5-pass_2.pod5 605G /lustre/home/jianghao/database/05_RNA/01_DM_ONT/pod5/LDL-1/pod5_pass_1/
Dataset to reproduce, if applicable (small subset of data to share as a pod5 to reproduce the issue):

Logs

[2024-08-05 21:02:10.576] [info] Running: "basecaller" "-x" "cpu" "--modified-bases" "m6A" "--mm2-preset" "splice" "-k14" "--no-trim" "/lustre/home/jianghao/workspace/01.RNA-seq/DM_bulk_ONT/output/dorado/rna004_130bps_sup@v5.0.0/" "/lustre/home/jianghao/database/05_RNA/01_DM_ONT/pod5/LDL-1/pod5_pass_1/" [2024-08-05 21:02:10.585] [error] Too few arguments Usage: dorado [-h] [--device VAR] [--read-ids VAR] [--resume-from VAR] [--max-reads VAR] [--min-qscore VAR] [--batchsize VAR] [--chunksize VAR] [--overlap VAR] [--recursive] [--modified-bases VAR...] [--modified-bases-models VAR] [--modified-bases-threshold VAR] [--emit-fastq] [--emit-sam] [--emit-moves] [--reference VAR] [--kit-name VAR] [--barcode-both-ends] [--no-trim] [--trim VAR] [--sample-sheet VAR] [--barcode-arrangement VAR] [--barcode-sequences VAR] [--primer-sequences VAR] [--estimate-poly-a] [--poly-a-config VAR] [-k VAR] [-w VAR] [-I VAR] [--secondary VAR] [-N VAR] [-Y] [--bandwidth VAR] [--junc-bed VAR] [--mm2-preset VAR] model data

Positional arguments: model model selection {fast,hac,sup}@v{version} for automatic model selection including modbases, or path to existing model directory data the data directory or file (POD5/FAST5 format).

Optional arguments: -h, --help shows help message and exits

Please provide output trace of dorado (run dorado with -v, or -vv on a small subset)

HalfPhoton commented 1 month ago

You might be missing a whitespace -k 14

JhinJhinJhin commented 1 month ago

dorado basecaller -x cpu --modified-bases m6A --mm2-preset splice -k14 --no-trim /lustre/home/jianghao/workspace/01.RNA-seq/DM_bulk_ONT/output/dorado/rna004_130bps_sup@v5.0.0/ /lustre/home/jianghao/database/05_RNA/01_DM_ONT/pod5/LDL-1/pod5_pass_1/ > LDL-1.bam

Thank u! But some problem still

[2024-08-06 09:21:10.563] [info] Running: "basecaller" "-x" "cpu" "--modified-bases" "m6A" "--mm2-preset" "splice" "-k" "14" "--no-trim" "/lustre/home/jianghao/workspace/01.RNA-seq/DM_bulk_ONT/output/dorado/rna004_130bps_sup@v5.0.0/" "/lustre/home/jianghao/database/05_RNA/01_DM_ONT/pod5/LDL-1/pod5_pass_1/"

terminate called after throwing an instance of 'std::runtime_error'
  what():  Cannot find modification model for 'm6A' reason: unknown simplex model 
m6a.sh: line 18: 115099 Aborted  (core dumped) 

dorado basecaller -x cpu --modified-bases m6A --mm2-preset splice -k 14 --no-trim /lustre/home/jianghao/workspace/01.RNA-seq/DM_bulk_ONT/output/dorado/rna004_130bps_sup@v5.0.0/ /lustre/home/jianghao/database/05_RNA/01_DM_ONT/pod5/LDL-1/pod5_pass_1/ > LDL-1.bam

I want to basecalling for direct RNA m6A base modified and poly-a tail length.

HalfPhoton commented 1 month ago

Can you add verbose logging with -vv please?

JhinJhinJhin commented 1 month ago

Can you add verbose logging with -vv please?

yes, I add -vv. But a new error occurred. Error code 12 (Cannot allocate memory).

Would setting higher RAM help? How can I fix this and spend less time to running?

[2024-08-07 09:02:23.427] [info] Running: "basecaller" "-x" "cpu" "--mm2-preset" "splice" "-k" "14" "--no-trim" "-vv" "/lustre/home/jianghao/workspace/01.RNA-seq/02_DM_6samples/rna004_130bps_sup
@v3.0.1/" "/lustre/home/jianghao/database/05_RNA/01_DM_ONT/pod5/LDL-1/pod5_pass_1/"
[2024-08-07 09:02:23.471] [trace] Model option: '/lustre/home/jianghao/workspace/01.RNA-seq/02_DM_6samples/rna004_130bps_sup@v3.0.1/' unknown - assuming path
[2024-08-07 09:02:23.490] [info] > Creating basecall pipeline
[2024-08-07 09:02:23.491] [debug] CRFModelConfig { qscale:0.900000 qbias:-0.100000 stride:5 bias:1 clamp:0 out_features:-1 state_len:5 outsize:4096 blank_score:2.000000 scale:5.000000 num_featur
es:1 sample_rate:4000 mean_qscore_start_pos:60 SignalNormalisationParams { strategy:quantile QuantileScalingParams { quantile_a:0.200000 quantile_b:0.800000 shift_multiplier:0.480000 scale_multi
plier:0.590000}} BasecallerParams { chunk_size:10000 overlap:500 batch_size:128} convs: { 0: ConvParams { insize:1 size:4 winlen:5 stride:1 activation:swish} 1: ConvParams { insize:4 size:16 win
len:5 stride:1 activation:swish} 2: ConvParams { insize:16 size:768 winlen:19 stride:5 activation:swish}} model_type: lstm { bias:1 outsize:4096 blank_score:2.000000 scale:5.000000}}
[2024-08-07 09:02:23.799] [info]  - BAM format does not support `U`, so RNA output files will include `T` instead of `U` for all file types.
[2024-08-07 09:02:23.802] [debug] - CPU calling: set num_cpu_runners to 18
[2024-08-07 09:02:29.933] [debug] BasecallerNode chunk size 10000
[2024-08-07 09:02:29.946] [debug] Load reads from file /lustre/home/jianghao/database/05_RNA/01_DM_ONT/pod5/LDL-1/pod5_pass_1/pod5-pass_1.pod5
[2024-08-07 09:02:30.036] [trace] Processing read dda253a6-2051-45cd-a90b-d237b5fa6c20
[2024-08-07 09:02:30.036] [trace] Running PORE_ADAPTER
[2024-08-07 09:02:30.036] [trace] RSN: PORE_ADAPTER strategy 0 splits in read dda253a6-2051-45cd-a90b-d237b5fa6c20
[2024-08-07 09:02:30.036] [trace] Read dda253a6-2051-45cd-a90b-d237b5fa6c20 split into 1 subreads: dda253a6-2051-45cd-a90b-d237b5fa6c20 (0); 
[2024-08-07 09:02:30.036] [trace] READ duration: 114 microseconds (ID: dda253a6-2051-45cd-a90b-d237b5fa6c20)
[2024-08-07 09:02:30.036] [trace] Processing read dfcdaeac-901b-4eb7-b3e1-d65b7148154c
[2024-08-07 09:02:30.036] [trace] Running PORE_ADAPTER
[2024-08-07 09:02:30.036] [trace] RSN: PORE_ADAPTER strategy 0 splits in read dfcdaeac-901b-4eb7-b3e1-d65b7148154c
[2024-08-07 09:02:30.036] [trace] Read dfcdaeac-901b-4eb7-b3e1-d65b7148154c split into 1 subreads: dfcdaeac-901b-4eb7-b3e1-d65b7148154c (0); 
[2024-08-07 09:02:30.036] [trace] READ duration: 36 microseconds (ID: dfcdaeac-901b-4eb7-b3e1-d65b7148154c)
[2024-08-07 09:02:30.036] [trace] Processing read 964466b3-48e8-4097-a370-5da629f841f4
[2024-08-07 09:02:30.036] [trace] Running PORE_ADAPTER
[2024-08-07 09:02:30.037] [trace] RSN: PORE_ADAPTER strategy 0 splits in read 964466b3-48e8-4097-a370-5da629f841f4
[2024-08-07 09:02:30.037] [trace] Read 964466b3-48e8-4097-a370-5da629f841f4 split into 1 subreads: 964466b3-48e8-4097-a370-5da629f841f4 (0); 
[2024-08-07 09:02:30.037] [trace] READ duration: 798 microseconds (ID: 964466b3-48e8-4097-a370-5da629f841f4)
[2024-08-07 09:02:30.037] [trace] Processing read 7361086e-2c59-4042-80e8-c7a31c3c630e
[2024-08-07 09:02:30.037] [trace] Running PORE_ADAPTER
[2024-08-07 09:02:30.037] [trace] RSN: PORE_ADAPTER strategy 0 splits in read 7361086e-2c59-4042-80e8-c7a31c3c630e
[2024-08-07 09:02:30.037] [trace] Read 7361086e-2c59-4042-80e8-c7a31c3c630e split into 1 subreads: 7361086e-2c59-4042-80e8-c7a31c3c630e (0); 
[2024-08-07 09:02:30.037] [trace] READ duration: 38 microseconds (ID: 7361086e-2c59-4042-80e8-c7a31c3c630e)
[2024-08-07 09:02:30.041] [trace] Processing read 891756fc-e402-4447-9374-ebe4399e499b
[2024-08-07 09:02:30.041] [trace] Running PORE_ADAPTER
[2024-08-07 09:02:30.041] [trace] RSN: PORE_ADAPTER strategy 0 splits in read 891756fc-e402-4447-9374-ebe4399e499b
[2024-08-07 09:02:30.041] [trace] Read 891756fc-e402-4447-9374-ebe4399e499b split into 1 subreads: 891756fc-e402-4447-9374-ebe4399e499b (0); 
[2024-08-07 09:02:30.041] [trace] READ duration: 210 microseconds (ID: 891756fc-e402-4447-9374-ebe4399e499b)
[2024-08-07 09:02:30.041] [trace] Processing read b086a3a0-66b6-4f65-adb8-9fdbb2de98c7
[2024-08-07 09:02:30.041] [trace] window 1000-1250 min 0 max 690 diff 690
[2024-08-07 09:02:30.041] [trace] window 1000-1250 min 0 max 652 diff 652
[2024-08-07 09:02:30.041] [trace] window 1000-1250 min 0 max 688 diff 688
[2024-08-07 09:02:30.041] [trace] window 1000-1250 min 0 max 639 diff 639
[2024-08-07 09:02:30.041] [trace] Running PORE_ADAPTER
[2024-08-07 09:02:30.042] [trace] RSN: PORE_ADAPTER strategy 0 splits in read b086a3a0-66b6-4f65-adb8-9fdbb2de98c7
[2024-08-07 09:02:30.041] [trace] window 1050-1300 min 0 max 652 diff 652
...
Read 83ab7f53-cfe9-45be-90f6-c52792bb68ca split into 1 subreads: 83ab7f53-cfe9-45be-90f6-c52792bb68ca (0); 
[2024-08-07 23:07:06.748] [trace] READ duration: 68345 microseconds (ID: 83ab7f53-cfe9-45be-90f6-c52792bb68ca)
terminate called after throwing an instance of 'c10::Error'
  what():  [enforce fail at alloc_cpu.cpp:75] err == 0. DefaultCPUAllocator: can't allocate memory: you tried to allocate 15144402944 bytes. Error code 12 (Cannot allocate memory)
frame #0: c10::ThrowEnforceNotMet(char const*, int, char const*, std::string const&, void const*) + 0x55 (0x2abdf9ffc855 in /lustre/home/jianghao/miniforge3/envs/ONT/bin/../lib/libdorado_torch_l
ib.so)
frame #1: c10::alloc_cpu(unsigned long) + 0x311 (0x2abdf9feddc1 in /lustre/home/jianghao/miniforge3/envs/ONT/bin/../lib/libdorado_torch_lib.so)
frame #2: <unknown function> + 0xaa0fe23 (0x2abdf9fd3e23 in /lustre/home/jianghao/miniforge3/envs/ONT/bin/../lib/libdorado_torch_lib.so)
frame #3: <unknown function> + 0x4530bc1 (0x2abdf3af4bc1 in /lustre/home/jianghao/miniforge3/envs/ONT/bin/../lib/libdorado_torch_lib.so)
frame #4: at::detail::empty_generic(c10::ArrayRef<long>, c10::Allocator*, c10::DispatchKeySet, c10::ScalarType, c10::optional<c10::MemoryFormat>) + 0x14 (0x2abdf3aee604 in /lustre/home/jianghao/
miniforge3/envs/ONT/bin/../lib/libdorado_torch_lib.so)
frame #5: at::detail::empty_cpu(c10::ArrayRef<long>, c10::ScalarType, bool, c10::optional<c10::MemoryFormat>) + 0x40 (0x2abdf3aee650 in /lustre/home/jianghao/miniforge3/envs/ONT/bin/../lib/libdo
rado_torch_lib.so)
...
m6a.sh: line 19:  8839 Aborted                 (core dumped) dorado basecaller -x cpu --mm2-preset splice -k 14 --no-trim -vv /lustre/home/jianghao/workspace/01.RNA-seq/02_DM_6samples/rna004_130
bps_sup@v3.0.1/ /lustre/home/jianghao/database/05_RNA/01_DM_ONT/pod5/LDL-1/pod5_pass_1/ > LDL-1.bam

1 source ~/.bashrc
  2 mamba activate ONT
  3 
  4 
  5 
  6 
  7 
  8 
  9 
 10 
 11 
 12 
 13 dorado basecaller \
 14     -x cpu \
 15         --mm2-preset splice -k 14 \
 16         --no-trim \
 17         -vv \
 18         /lustre/home/jianghao/workspace/01.RNA-seq/02_DM_6samples/rna004_130bps_sup@v3.0.1/ \
 19         /lustre/home/jianghao/database/05_RNA/01_DM_ONT/pod5/LDL-1/pod5_pass_1/ > LDL-1.bam

HalfPhoton commented 1 month ago

Yes - It looks like you're running out of memory.

Also this isn't the same command you were having issues with as it doesn't contain the --modified-bases argument at all or even the same simplex model.

I suspect the issue is that your cluster environment doesn't have access to the web and dorado cannot download the modified bases model for you.

Please download all the models you need first with dorado download --model <model_name> and then use dorado basecaller ... --modified-bases-models <model_path>

Note the additional -models in the argument

JhinJhinJhin commented 1 month ago

Yes - It looks like you're running out of memory.

Also this isn't the same command you were having issues with as it doesn't contain the --modified-bases argument at all or even the same simplex model.

I suspect the issue is that your cluster environment doesn't have access to the web and dorado cannot download the modified bases model for you.

Please download all the models you need first with dorado download --model <model_name> and then use dorado basecaller ... --modified-bases-models <model_path>

Note the additional -models in the argument

The following command line works. How can I set up more threads/cpu to reduce runtime ? I ran it for 20 minutes and got a 4.0M bam file And if I just want to get only m6A information of reads, what commond I should to use?

dorado basecaller --modified-bases-models /lustre/home/jianghao/workspace/01.RNA-seq/02_DM_6samples/output/dorado/rna004_130bps_sup@v3.0.1_m6A_DRACH@v1 /lustre/home/jianghao/workspace/01.RNA-seq/02_DM_6samples/output/dorado/rna004_130bps_sup@v3.0.1 /lustre/home/jianghao/database/05_RNA/01_DM_ONT/pod5/LDL-1/pod5_pass_1/ -x cpu --mm2-preset splice -k 14 --no-trim -vv > LDL-1.bam

[2024-08-09 11:16:14.386] [info] Running: "basecaller" "--modified-bases-models" "/lustre/home/jianghao/workspace/01.RNA-seq/02_DM_6samples/output/dorado/rna004_130bps_sup@v3.0.1_m6A_DRACH@v1" "[2024-08-09 11:16:14.403] [trace] Model option: '/lustre/home/jianghao/workspace/01.RNA-seq/02_DM_6samples/output/dorado/rna004_130bps_sup@v3.0.1' unknown - assuming path [2024-08-09 11:16:14.406] [info] > Creating basecall pipeline [2024-08-09 11:16:14.406] [debug] CRFModelConfig { qscale:0.900000 qbias:-0.100000 stride:5 bias:1 clamp:0 out_features:-1 state_len:5 outsize:4096 blank_score:2.000000 scale:5.000000 num_featur[2024-08-09 11:16:14.805] [info] - BAM format does not support U, so RNA output files will include T instead of U for all file types. [2024-08-09 11:16:15.907] [debug] - CPU calling: set num_cpu_runners to 18 [2024-08-09 11:16:21.814] [debug] BasecallerNode chunk size 10000 [2024-08-09 11:16:21.869] [debug] Load reads from file /lustre/home/jianghao/database/05_RNA/01_DM_ONT/pod5/LDL-1/pod5_pass_1/pod5-pass_1.pod5 [2024-08-09 11:16:21.959] [trace] Processing read dda253a6-2051-45cd-a90b-d237b5fa6c20 [2024-08-09 11:16:21.959] [trace] Running PORE_ADAPTER

HalfPhoton commented 1 month ago

@JhinJhinJhin, there are no options to control the number of CPUs used by dorado - it will use as many as there are available.

Please use a GPU to get significantly better performance.

The m6A modification information is written to the MM / ML tags in the bam file for all reads in the dataset. There are a number of tools to process this information and help online.

nanoporetech / dorado

cpu basecaller --modified-bases m6A #973

Issue Report

Please describe the issue:

Steps to reproduce the issue:

Run environment:

Logs