wenet-e2e / wenet

Production First and Production Ready End-to-End Speech Recognition Toolkit
https://wenet-e2e.github.io/wenet/
Apache License 2.0
4.08k stars 1.06k forks source link

continuous_decoding mode seems to miss alternate segments. #808

Closed ngoel17 closed 6 months ago

ngoel17 commented 2 years ago

Describe the bug I decoded several files in continuous decoding mode and did a formal WER calculation. To my surprise, the pattern that I see is that half the segments are generally good, and then the other half is "deletion error" in an alternating manner, making to total average error rate above 50%. It appears that the wenet in streaming_mode is missing every alternate segment from the endpointer.

To Reproduce Steps to reproduce the behavior:

  1. run websocket_server_main with TLG decode.
  2. Run websocket_client_main --continuous_decoding true
  3. Have manual transcripts of the audio file handy, and hopefully you can figure out from endpointer rules, where it will kick in and cases a new segment.
  4. Observe that the transcripts of alternate segments are not returned to the websocket_client_main

Expected behavior Entire transcripts should be returned.

Screenshots Let me know if you fail to replicate. I will give you a real example along with an audio file that you can use with gigaspeech model.

Desktop (please complete the following information):

Smartphone (please complete the following information):

Additional context Add any other context about the problem here.

robin1001 commented 2 years ago

Did you try to debug? @pengzhendong please follow the issue.

pengzhendong commented 2 years ago

Have you try the websocket_server_main without TLG?

ngoel17 commented 2 years ago

looking at the while loop in https://github.com/wenet-e2e/wenet/blob/b815a09b0cd023760454bc11c00d0a19327ab54d/runtime/core/websocket/websocket_server.cc#L126

It appears that it tried to decode everything. However the pattern of contiguous deletions is very obvious, and if I manually segment (long regordings) and pass audio to decoder_main. the accuracy is much better than passing the whole file to websocket_server_main using the websocket_client_main

ngoel17 commented 2 years ago

@pengzhendong why do you think it's important to compare with/without TLG. Do you think the issue is TLG related?

pengzhendong commented 2 years ago

Because the decoder with TLG will skip some blank frames. https://github.com/wenet-e2e/wenet/blob/b815a09b0cd023760454bc11c00d0a19327ab54d/runtime/core/decoder/ctc_wfst_beam_search.cc#L78-L79

ngoel17 commented 2 years ago

I have this set to 0.98 but setting to 0.99 does not change much either. If the score is that high, most likely it is a blank frame, and in general TLG accuracy is decent. After proper parameter tuning it goes from CTC -> 14% to TLG -> 11.5%. But from the websocket server its 65% WER

ngoel17 commented 2 years ago

It appears that the decoder is very sensitive to the setting of the chunk size. Not sure why that is the case, because I am assuming that the training uses variable right contexts to make the chunk-size adjustable parameter during decoding. Anyways - setting that to "-1" (from the initial setting of 16) seems to have improved and addressed the deletions issue. Unfortunately, the server behavior is erratic now. It ran OOM. I think its the /wenet/transformer/embedding.py that makes the system go OOM. I made the following change to prevent the code from giving an Assert error. diff --git a/wenet/transformer/embedding.py b/wenet/transformer/embedding.py index a47afd9..e39ef6d 100644 --- a/wenet/transformer/embedding.py +++ b/wenet/transformer/embedding.py @@ -24,7 +24,7 @@ class PositionalEncoding(torch.nn.Module): def init(self, d_model: int, dropout_rate: float,

ngoel17 commented 2 years ago

HMMMM another update. Even if I undo the previous change in the comment above, the memory still seems to continue to bloat. It's as if chunk size of -1 is causing it to hog memory elsewhere. The problem is that if we Valgrind the code, then it just points to some mkl sgemm that is invoked inside libtorch_cpu.so but does not tell which part of wenent code is causing that issue. I saw that someone else also complained that chunk size -1 causes memory bloat, so maybe it's a real issue with the code that has been around for a while. Has anybody ever used it on large scale with chunk size -1? How should the issue be resolved. Also should I change the title of the topic to make it more discoverable?

pengzhendong commented 2 years ago
  1. Could you please test the websocket_server_main without TLG? Or try to tune the weight of the LM. I think it's easier to figure out this problem.
  2. Set the chunk_size to -1 means the model work in a non-streaming way. We often use that to recognize the short audio.
  3. Maybe you need to compile the libtorch_cpu.so manually.
ishine commented 2 years ago

I think this is due to vad info is depends on the blank prob.

pengzhendong commented 2 years ago

Try to tune the blank prob.

https://github.com/wenet-e2e/wenet/blob/b815a09b0cd023760454bc11c00d0a19327ab54d/runtime/core/decoder/ctc_endpoint.h#L26

ngoel17 commented 2 years ago

Here are some results when decoding without the TLG. 1) When chunk_size is set to -1 ---- Runs OOM 2) When chunksize is set to 48. It throws an exception and crashes. The trace is pasted below. Looke like ws.write() is not very happy.

__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49 49 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt

0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49

1 0x00007fffe453c864 in __GI_abort () at abort.c:79

2 0x00007fffe4908911 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6

3 0x00007fffe491438c in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6

4 0x00007fffe49143f7 in std::terminate() () from /lib/x86_64-linux-gnu/libstdc++.so.6

5 0x00007fffe49146a9 in __cxa_throw () from /lib/x86_64-linux-gnu/libstdc++.so.6

6 0x00005555556407f4 in boost::throw_exception (e=..., loc=...)

at /mnt/dsk1/wenet/runtime/server/x86/fc_base/boost-src/boost/throw_exception.hpp:171

7 0x0000555555640cb5 in boost::beast::websocket::stream<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::execution::any_executor<boost::asio::execution::context_as_t<boost::asio::execution_context&>, boost::asio::execution::detail::blocking::never_t<0>, boost::asio::execution::prefer_only<boost::asio::execution::detail::blocking::possibly_t<0> >, boost::asio::execution::prefer_only<boost::asio::execution::detail::outstanding_work::tracked_t<0> >, boost::asio::execution::prefer_only<boost::asio::execution::detail::outstanding_work::untracked_t<0> >, boost::asio::execution::prefer_only<boost::asio::execution::detail::relationship::fork_t<0> >, boost::asio::execution::prefer_only<boost::asio::execution::detail::relationship::continuation_t<0> > > >, true>::write (this=0x5555579ab100, buffers=...)

at /mnt/dsk1/wenet/runtime/server/x86/fc_base/boost-src/boost/beast/websocket/impl/write.hpp:812

8 0x000055555560d4e9 in wenet::ConnectionHandler::OnPartialResult (this=0x5555579ab0f8,

result="[{\"sentence\":\"this is a test sentence to check if wenet does anything\"}]") at /home/ngoel-00171/wenet/runtime/server/x86/websocket/websocket_server.cc:69

9 0x000055555560e559 in wenet::ConnectionHandler::DecodeThreadFunc (this=0x5555579ab0f8) at /home/ngoel-00171/wenet/runtime/server/x86/websocket/websocket_server.cc:151

10 0x00005555556b797d in std::__invoke_impl<void, void (wenet::ConnectionHandler::)(), wenet::ConnectionHandler> (

__f=@0x7fff18004430: (void (wenet::ConnectionHandler::*)(class wenet::ConnectionHandler * const)) 0x55555560e3b8 <wenet::ConnectionHandler::DecodeThreadFunc()>, 
__t=@0x7fff18004428: 0x5555579ab0f8) at /usr/include/c++/10/bits/invoke.h:73

11 0x00005555556b74ab in std::__invoke<void (wenet::ConnectionHandler::)(), wenet::ConnectionHandler> (

__fn=@0x7fff18004430: (void (wenet::ConnectionHandler::*)(class wenet::ConnectionHandler * const)) 0x55555560e3b8 <wenet::ConnectionHandler::DecodeThreadFunc()>)
at /usr/include/c++/10/bits/invoke.h:95

12 0x00005555556b65d1 in std::thread::_Invoker<std::tuple<void (wenet::ConnectionHandler::)(), wenet::ConnectionHandler> >::_M_invoke<0ul, 1ul> (this=0x7fff18004428)

at /usr/include/c++/10/thread:264

13 0x00005555556b5462 in std::thread::_Invoker<std::tuple<void (wenet::ConnectionHandler::)(), wenet::ConnectionHandler> >::operator() (this=0x7fff18004428)

at /usr/include/c++/10/thread:271

14 0x00005555556b33b8 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (wenet::ConnectionHandler::)(), wenet::ConnectionHandler> > >::_M_run (

this=0x7fff18004420) at /usr/include/c++/10/thread:215

15 0x00007fffe4940de4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6

16 0x00007fffe5085590 in start_thread (arg=0x7fff30d18640) at pthread_create.c:463

17 0x00007fffe462f223 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

pengzhendong commented 2 years ago

The chunk_size <= 16 maybe a better choice.

ishine commented 2 years ago

I think this is due to vad info is depends on the blank prob.

In ResetContinuousDecoding we should keep the state of some recent frames, not just reset.

pengzhendong commented 2 years ago

@ishine You are right. Could you please commit a pr? Or have you test it?

ishine commented 2 years ago

@ishine You are right. Could you please commit a pr? Or have you test it?

I used offline vad.

ngoel17 commented 2 years ago

A PR will be very helpful. I would volunteer to test.

xingchensong commented 1 year ago

memory leak has been fixed by jemalloc, see https://github.com/wenet-e2e/wenet/issues/352

xingchensong commented 1 year ago

I think this is due to vad info is depends on the blank prob.

In ResetContinuousDecoding we should keep the state of some recent frames, not just reset.

I agree, PR is welcome

duj12 commented 1 year ago

Hi,I meet the same problem when I'm using wenet_api python binding. If I set continuousdecoding with True, the segment's result before the Endpinting will be clear by `decoder->ResetContinuousDecoding();` at https://github.com/wenet-e2e/wenet/blob/main/runtime/core/api/wenet_api.cc#L143 . If I break after ResetContinuousDecoding, the result before Endpoint will be kept. Maybe you can try it in https://github.com/wenet-e2e/wenet/blob/main/runtime/core/websocket/websocket_server.cc#L138

The modification is like image

But I'm not sure if this break will result wave points or features being missed. In my opinion, ResetContinuousDecoding() will not clear the featurepipeline's cache. So the next Decode will use the remained features. Am I right? @xingchensong @pengzhendong

github-actions[bot] commented 7 months ago

This issue has been automatically closed due to inactivity.