k2-fsa / k2

FSA/FST algorithms, differentiable, with PyTorch compatibility.
https://k2-fsa.github.io/k2
Apache License 2.0
1.08k stars 211 forks source link

Nbest online decoding bug #1276

Open binhtranmcs opened 3 months ago

binhtranmcs commented 3 months ago

I tried online decoding using the file online_decode.cu and modified a bit to get nbest from the lattice. But I get errors. The assertion is in fsa_utils.cu line 2754.

The offline decoding seems fine. So I suspect there is something wrong with the online implementation. Also, it does not happen to all audios tested. Please help me with this.

Model: librispeech conformer ctc Code: online_decode.txt Log debug: error.log

Thanks!

csukuangfj commented 3 months ago

@pkufool Could you have a look?

trangtv57 commented 3 months ago

i have same issuse, pls, fast check this error :(

cudothanh-Nhan commented 3 months ago

Same with me :(

pkufool commented 3 months ago

OK, I will have a look soon.

trangtv57 commented 3 months ago

sorry but any update @pkufool pk?

pkufool commented 2 months ago

@trangtv57 Sorry for the late responds. I reproduced your issue, the error seems happen on the invert of the generated lattice. There is a quick fix at https://github.com/k2-fsa/k2/pull/1280 , it works fine on the given conformer-ctc model above. Pls help to do more tests, thanks!

trangtv57 commented 2 months ago

tks @pkufool, i will check it then feedback you soon.

binhtranmcs commented 2 months ago

tks @pkufool, I just tested again with the model above and it seems fine. But when decoding with my own model, there is still error, the assertion in fsa_utils.cu line 2756. Please have a further look.

pkufool commented 2 months ago

tks @pkufool, I just tested again with the model above and it seems fine. But when decoding with my own model, there is still error, the assertion in fsa_utils.cu line 2756. Please have a further look.

If it is line 2756 in fsa_utils.cu , I think it is the same issue. Could you make sure that (for example, testing with more cases) it works fine with our model? It will be easier for me to debug if I have the model that can reproduce it. Thank you!

binhtranmcs commented 2 months ago

@pkufool, I tested again with the audio below (change extension to .wav), and got a different the error:

[F] /home/cpu13266/binhtt4/clone/k2/k2/csrc/array.h:385:T k2::Array1<T>::operator[](int32_t) const [with T = int; int32_t = int] Check failed: ret == cudaSuccess (700 vs. 0)  Error: an illegal memory access was encountered. 

You can try debugging with it, same model as above.

61-70970-0031.txt

danpovey commented 2 months ago

I merged Kangwei's fix as it seems to be a straightforward fix of an undefined value, if there are further bugs we can fix separately.

pkufool commented 2 months ago

@binhtranmcs Here is anther fix https://github.com/k2-fsa/k2/pull/1282 , I think previous fix #1280 introduced the bug. The code runs normally in all my test cases now, pls do more tests, thanks!

binhtranmcs commented 2 months ago

@pkufool, I tested with the librispeech dataset and it ran smoothly. But there is still error when tested with the audio below (change ext to .wav):

[F] /home/cpu13266/binhtt4/clone/k2/k2/csrc/top_sort.cu:324:k2::FsaVec k2::TopSorter::TopSort(k2::Array1<int>*) Check failed: start_state_present[0] == 1 (0 vs. 1) Our current implementation requires that the start state in each Fsa must be present in the first batch

nbest-err.txt

pkufool commented 2 months ago

@binhtranmcs Can you paste your stack backtrace here, I can't reproduce your error.

binhtranmcs commented 2 months ago

@pkufool fyi

__GI_raise 0x00007fff9700400b
k2::internal::Logger::~Logger log.h:203
k2::TopSorter::TopSort top_sort.cu:324
k2::TopSort top_sort.cu:371
k2::TopSort fsa_algo.cu:141
k2::Nbest::Intersect nbest.cu:77
main online_decode.cu:336

The log:

[F] /home/cpu13266/binhtt4/clone/k2/k2/csrc/top_sort.cu:324:k2::FsaVec k2::TopSorter::TopSort(k2::Array1<int>*) Check failed: start_state_present[0] == 1 (0 vs. 1) Our current implementation requires that the start state in each Fsa must be present in the first batch

[ Stack-Trace: ]
/home/cpu13266/binhtt4/clone/k2/cmake-build-debug/lib/libk2_log.so(k2::internal::GetStackTrace()+0x5f) [0x7ffff641f34a]
/home/cpu13266/binhtt4/clone/k2/cmake-build-debug/bin/online_decode(k2::internal::Logger::~Logger()+0x48) [0x5555555a829e]
/home/cpu13266/binhtt4/clone/k2/cmake-build-debug/lib/libk2context.so(k2::TopSorter::TopSort(k2::Array1<int>*)+0x3cc) [0x7ffff6a3fad4]
/home/cpu13266/binhtt4/clone/k2/cmake-build-debug/lib/libk2context.so(k2::TopSort(k2::Ragged<k2::Arc>&, k2::Ragged<k2::Arc>*, k2::Array1<int>*)+0x3a7) [0x7ffff6a3778a]
/home/cpu13266/binhtt4/clone/k2/cmake-build-debug/lib/libk2_torch.so(k2::TopSort(k2::FsaClass*)+0x58) [0x7ffff7896a62]
/home/cpu13266/binhtt4/clone/k2/cmake-build-debug/lib/libk2_torch.so(k2::Nbest::Intersect(k2::FsaClass*)+0x41d) [0x7ffff78b6cef]
/home/cpu13266/binhtt4/clone/k2/cmake-build-debug/bin/online_decode(+0x4fba2) [0x5555555a3ba2]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7fff96fe5083]
/home/cpu13266/binhtt4/clone/k2/cmake-build-debug/bin/online_decode(+0x4d79e) [0x5555555a179e]

terminate called after throwing an instance of 'std::runtime_error'
  what():  
    Some bad things happened. Please read the above error messages and stack
    trace. If you are using Python, the following command may be helpful:

      gdb --args python /path/to/your/code.py

    (You can use `gdb` to debug the code. Please consider compiling
    a debug version of k2.).

    If you are unable to fix it, please open an issue at:

      https://github.com/k2-fsa/k2/issues/new

Signal: SIGABRT (Aborted)

Also the code I use: online_decode.cu.txt