RByers commented 4 weeks ago

Issue Report

Please describe the issue:

About half the time I use Dorado, the progress updates scroll madly down the terminal with lots of blank lines interspersed. The other half the progress bar updates in place as expected. Also I think it's unusually slow when in this mode, and may also be correlated to crashes I'm seeing (exit with ERRORLEVEL 0xC0000409).

Here's a screenshot example when I freeze the scrolling (by selecting some text). This example happens to be a "resuming from file" status, but it is the same with regular basecalling status.

Steps to reproduce the issue:

Unknown. Unclear why it sometimes works and sometimes doesn't.

Run environment:

Dorado version: 0.7.3+6e6c45cd+cu11080
Dorado command: ..\dorado\bin\dorado.exe basecaller hac@latest c:\data\Q8\SQ8\20240830_1109_MN41817_FAZ83080_1e1c31e4\pod5\ --kit-name SQK-NBD111-24 --trim adapters --resume-from Q8a-hac-aborted.bam > Q8a-hac.bam
Operating system: Windows 11 Home
Hardware (CPUs, Memory, GPUs): Intel Core i7, 16 GB RAM, NVidia GEForce RTX 3060
Source data type: pod5
Source data location: local disk
Details about data (flow cell, kit, read lengths, number of reads, total dataset size in MB/GB/TB): MinIon nearly full run, ~100 GB
Dataset to reproduce, if applicable (small subset of data to share as a pod5 to reproduce the issue): Unclear what reproduces it - doesn't seem specific to the data (though maybe happens less frequently when I pick a single file?). Cancelling and immediately retrying the exact same command sometimes works.

Logs

"c:\data\Q8\SQ8\20240830_1109_MN41817_FAZ83080_1e1c31e4\pod5\" "--kit-name" "SQK-NBD111-24" "--trim" "adapters" "--resume-from" "Q8a-hac-aborted.bam"
[2024-09-04 22:29:39.411] [info]  - downloading dna_r10.4.1_e8.2_400bps_hac@v5.0.0 with httplib
[2024-09-04 22:29:40.224] [info] Normalised: chunksize 10000 -> 9996
[2024-09-04 22:29:40.225] [info] Normalised: overlap 500 -> 498
[2024-09-04 22:29:40.226] [info] > Creating basecall pipeline
[2024-09-04 22:29:40.226] [debug] CRFModelConfig { qscale:1.050000 qbias:-0.600000 stride:6 bias:0 clamp:1 out_features:-1 state_len:4 outsize:1024 blank_score:2.000000 scale:1.000000 num_features:1 sample_rate:5000 mean_qscore_start_pos:60 SignalNormalisationParams { strategy:pa StandardisationScalingParams { standardise:1 mean:93.692398 stdev:23.506744}} BasecallerParams { chunk_size:9996 overlap:498 batch_size:0} convs: { 0: ConvParams { insize:1 size:16 winlen:5 stride:1 activation:swish} 1: ConvParams { insize:16 size:16 winlen:5 stride:1 activation:swish} 2: ConvParams { insize:16 size:384 winlen:19 stride:6 activation:tanh}} model_type: lstm { bias:0 outsize:1024 blank_score:2.000000 scale:1.000000}}
[2024-09-04 22:29:40.637] [debug] cuda:0 memory available: 5.33GB
[2024-09-04 22:29:40.637] [debug] cuda:0 memory limit 4.33GB
[2024-09-04 22:29:40.638] [debug] cuda:0 maximum safe estimated batch size at chunk size 9996 is 640
[2024-09-04 22:29:40.638] [debug] cuda:0 maximum safe estimated batch size at chunk size 4998 is 1280
[2024-09-04 22:29:40.638] [debug] Auto batchsize cuda:0: testing up to 1280 in steps of 64
[2024-09-04 22:29:40.722] [debug] Auto batchsize cuda:0: 64, time per chunk 0.484224 ms
[2024-09-04 22:29:40.791] [debug] Auto batchsize cuda:0: 128, time per chunk 0.232319 ms
[2024-09-04 22:29:40.891] [debug] Auto batchsize cuda:0: 192, time per chunk 0.250123 ms
[2024-09-04 22:29:41.011] [debug] Auto batchsize cuda:0: 256, time per chunk 0.221412 ms
[2024-09-04 22:29:41.171] [debug] Auto batchsize cuda:0: 320, time per chunk 0.235394 ms
[2024-09-04 22:29:41.339] [debug] Auto batchsize cuda:0: 384, time per chunk 0.204975 ms
[2024-09-04 22:29:41.541] [debug] Auto batchsize cuda:0: 448, time per chunk 0.214215 ms
[2024-09-04 22:29:41.761] [debug] Auto batchsize cuda:0: 512, time per chunk 0.203446 ms
[2024-09-04 22:29:42.024] [debug] Auto batchsize cuda:0: 576, time per chunk 0.217572 ms
[2024-09-04 22:29:42.300] [debug] Auto batchsize cuda:0: 640, time per chunk 0.205259 ms
[2024-09-04 22:29:42.599] [debug] Auto batchsize cuda:0: 704, time per chunk 0.196926 ms
[2024-09-04 22:29:42.910] [debug] Auto batchsize cuda:0: 768, time per chunk 0.188587 ms
[2024-09-04 22:29:43.260] [debug] Auto batchsize cuda:0: 832, time per chunk 0.198240 ms
[2024-09-04 22:29:43.626] [debug] Auto batchsize cuda:0: 896, time per chunk 0.190146 ms
[2024-09-04 22:29:44.025] [debug] Auto batchsize cuda:0: 960, time per chunk 0.193588 ms
[2024-09-04 22:29:44.440] [debug] Auto batchsize cuda:0: 1024, time per chunk 0.187385 ms
[2024-09-04 22:29:44.896] [debug] Auto batchsize cuda:0: 1088, time per chunk 0.193474 ms
[2024-09-04 22:29:45.364] [debug] Auto batchsize cuda:0: 1152, time per chunk 0.186017 ms
[2024-09-04 22:29:45.791] [debug] Auto batchsize cuda:0: 1216, time per chunk 0.161354 ms
[2024-09-04 22:29:46.223] [debug] Auto batchsize cuda:0: 1280, time per chunk 0.155547 ms
[2024-09-04 22:29:46.232] [debug] Largest batch size for cuda:0: 1280, time per chunk 0.155547 ms
[2024-09-04 22:29:46.233] [debug] Final batch size for cuda:0[0]: 512
[2024-09-04 22:29:46.233] [debug] Final batch size for cuda:0[1]: 1280
[2024-09-04 22:29:46.233] [info] cuda:0 using chunk size 9996, batch size 512
[2024-09-04 22:29:46.233] [debug] cuda:0 Model memory 2.40GB
[2024-09-04 22:29:46.233] [debug] cuda:0 Decode memory 0.99GB
[2024-09-04 22:29:46.856] [info] cuda:0 using chunk size 4998, batch size 1280
[2024-09-04 22:29:46.856] [debug] cuda:0 Model memory 3.00GB
[2024-09-04 22:29:46.856] [debug] cuda:0 Decode memory 1.24GB
[2024-09-04 22:29:48.802] [debug] BasecallerNode chunk size 9996
[2024-09-04 22:29:48.802] [debug] BasecallerNode chunk size 4998
[2024-09-04 22:29:48.813] [info] > Inspecting resume file...
[2024-09-04 22:29:48.814] [info] Resuming from file Q8a-hac-aborted.bam...
[2024-09-04 22:29:55.139] [info] > 422094 original read ids found in resume file.
[2024-09-04 22:29:55.383] [debug] Load reads from file c:\data\Q8\SQ8\20240830_1109_MN41817_FAZ83080_1e1c31e4\pod5\FAZ83080_1e1c31e4_e3695847_0.pod5
[2024-09-04 22:29:56.802] [debug] > Kits to evaluate: 1asecalling

malton-ont commented 4 weeks ago

Reoccurence of https://github.com/nanoporetech/dorado/issues/400

RByers commented 4 weeks ago

Ah, thank you. I searched issues but I guess didn't find the correct terms. Note that I'm just running from the cmd.exe shell. Should I be using something else?

malton-ont commented 3 weeks ago

We haven't been able to conclusively identify the root cause of this issue, though it appears to be related to system settings for Windows 11. You could try using git bash, which seems to be most reliable. Powershell is also an option, though you will need to be aware of https://github.com/nanoporetech/dorado/issues/604 and possibly use the out-file -encoding ASCII workaround.

fidibidi commented 2 days ago

running via Anaconda Prompt (miniconda3) does not produce this error FYI!

nanoporetech / dorado

Progress scrolls instead of updates in place, runs slowly #1016

Issue Report

Please describe the issue:

Steps to reproduce the issue:

Run environment:

Logs