nanoporetech / dorado

Oxford Nanopore's Basecaller
https://nanoporetech.com/
Other
445 stars 54 forks source link

libdorado_torch_lib.so error when running supv5.0.0 models #917

Open ywang285 opened 5 days ago

ywang285 commented 5 days ago

Issue Report

Please describe the issue:

Unable to run dorado basecaller with supv5.0.0 models. sup@v4.3.0 works fine.

Steps to reproduce the issue:

Please list any steps to reproduce the issue.

Run environment:

Logs

[2024-06-30 02:27:21.701] [info] Running: "basecaller" "--verbose" "/oak/stanford/groups/pmischel/Yanbo/software/dorado-0.7.2-linux-x64/models/dna_r10.4.1_e8.2_400bps_sup@v5.0.0/" "/oak/stanford/groups/pmischel/Yanbo/Data/ONT/20230731_DML_LaminB1_GBM39EC/GBM39EC_Tube_1/20230801_0155_2E_PAO23357_f1e09b45/pod5_test/barcode02" "--modified-bases-models" "/oak/stanford/groups/pmischel/Yanbo/software/dorado-0.7.2-linux-x64/models/dna_r10.4.1_e8.2_400bps_sup@v5.0.0_6mA@v1/,/oak/stanford/groups/pmischel/Yanbo/software/dorado-0.7.2-linux-x64/models/dna_r10.4.1_e8.2_400bps_sup@v5.0.0_5mCG_5hmCG@v1/" "--modified-bases-threshold" "0.9"
[2024-06-30 02:27:21.763] [info] > Creating basecall pipeline
[2024-06-30 02:27:21.763] [debug] CRFModelConfig { qscale:1.050000 qbias:1.300000 stride:6 bias:1 clamp:0 out_features:4096 state_len:5 outsize:4096 blank_score:0.000000 scale:1.000000 num_features:1 sample_rate:5000 mean_qscore_start_pos:60 SignalNormalisationParams { strategy:pa StandardisationScalingParams { standardise:1 mean:93.692398 stdev:23.506744}} BasecallerParams { chunk_size:12288 overlap:600 batch_size:0} convs: { 0: ConvParams { insize:1 size:64 winlen:5 stride:1 activation:swish} 1: ConvParams { insize:64 size:64 winlen:5 stride:1 activation:swish} 2: ConvParams { insize:64 size:128 winlen:9 stride:3 activation:swish} 3: ConvParams { insize:128 size:128 winlen:9 stride:2 activation:swish} 4: ConvParams { insize:128 size:512 winlen:5 stride:2 activation:swish}} model_type: tx { crf_encoder: CRFEncoderParams { insize:512 n_base:4 state_len:5 scale:5.000000 blank_score:2.000000 expand_blanks:1 permute:1} transformer: TxEncoderParams { d_model:512 nhead:8 depth:18 dim_feedforward:2048 deepnorm_alpha:2.449490}}}
[2024-06-30 02:27:22.453] [error] fseek returned -1
Exception raised from FileAdapter at /pytorch/pyold/caffe2/serialize/file_adapter.cc:40 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7fe75e1189b7 in /oak/stanford/groups/pmischel/Yanbo/software/dorado-0.7.2-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7fe75769d115 in /oak/stanford/groups/pmischel/Yanbo/software/dorado-0.7.2-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #2: caffe2::serialize::FileAdapter::FileAdapter(std::string const&) + 0x300 (0x7fe75a143120 in /oak/stanford/groups/pmischel/Yanbo/software/dorado-0.7.2-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #3: caffe2::serialize::PyTorchStreamReader::PyTorchStreamReader(std::string const&) + 0x5a (0x7fe75a1412aa in /oak/stanford/groups/pmischel/Yanbo/software/dorado-0.7.2-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #4: torch::jit::import_ir_module(std::shared_ptr<torch::jit::CompilationUnit>, std::string const&, c10::optional<c10::Device>, std::unordered_map<std::string, std::string, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::string> > >&, bool, bool) + 0x2c0 (0x7fe75b28e090 in /oak/stanford/groups/pmischel/Yanbo/software/dorado-0.7.2-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #5: torch::jit::import_ir_module(std::shared_ptr<torch::jit::CompilationUnit>, std::string const&, c10::optional<c10::Device>, bool) + 0x7f (0x7fe75b28e40f in /oak/stanford/groups/pmischel/Yanbo/software/dorado-0.7.2-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #6: torch::jit::load(std::string const&, c10::optional<c10::Device>, bool) + 0xac (0x7fe75b28e4ec in /oak/stanford/groups/pmischel/Yanbo/software/dorado-0.7.2-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #7: torch::serialize::InputArchive::load_from(std::string const&, c10::optional<c10::Device>) + 0x26 (0x7fe75b810596 in /oak/stanford/groups/pmischel/Yanbo/software/dorado-0.7.2-linux-x64/bin/../lib/libdorado_torch_lib.so)
frame #8: dorado() [0xa9b14a]
frame #9: dorado() [0xa9907d]
frame #10: dorado() [0xa74c21]
frame #11: dorado() [0xa67d9f]
frame #12: dorado() [0xa685ac]
frame #13: dorado() [0x96c839]
frame #14: dorado() [0x873cf3]
frame #15: dorado() [0x84edaf]
frame #16: dorado() [0x8553ff]
frame #17: dorado() [0x4ce29d]
frame #18: __libc_start_main + 0xf5 (0x7fe751937555 in /lib64/libc.so.6)
frame #19: dorado() [0x7f92a7]
HalfPhoton commented 4 days ago

Hi @ywang285, Can you please try to delete and re-download your models?

Kind regards, Rich