Closed ngoel17 closed 6 months ago
Did you try to debug? @pengzhendong please follow the issue.
Have you try the websocket_server_main without TLG?
looking at the while loop in https://github.com/wenet-e2e/wenet/blob/b815a09b0cd023760454bc11c00d0a19327ab54d/runtime/core/websocket/websocket_server.cc#L126
It appears that it tried to decode everything. However the pattern of contiguous deletions is very obvious, and if I manually segment (long regordings) and pass audio to decoder_main. the accuracy is much better than passing the whole file to websocket_server_main using the websocket_client_main
@pengzhendong why do you think it's important to compare with/without TLG. Do you think the issue is TLG related?
Because the decoder with TLG will skip some blank frames. https://github.com/wenet-e2e/wenet/blob/b815a09b0cd023760454bc11c00d0a19327ab54d/runtime/core/decoder/ctc_wfst_beam_search.cc#L78-L79
I have this set to 0.98 but setting to 0.99 does not change much either. If the score is that high, most likely it is a blank frame, and in general TLG accuracy is decent. After proper parameter tuning it goes from CTC -> 14% to TLG -> 11.5%. But from the websocket server its 65% WER
It appears that the decoder is very sensitive to the setting of the chunk size. Not sure why that is the case, because I am assuming that the training uses variable right contexts to make the chunk-size adjustable parameter during decoding. Anyways - setting that to "-1" (from the initial setting of 16) seems to have improved and addressed the deletions issue. Unfortunately, the server behavior is erratic now. It ran OOM. I think its the /wenet/transformer/embedding.py that makes the system go OOM. I made the following change to prevent the code from giving an Assert error. diff --git a/wenet/transformer/embedding.py b/wenet/transformer/embedding.py index a47afd9..e39ef6d 100644 --- a/wenet/transformer/embedding.py +++ b/wenet/transformer/embedding.py @@ -24,7 +24,7 @@ class PositionalEncoding(torch.nn.Module): def init(self, d_model: int, dropout_rate: float,
HMMMM another update. Even if I undo the previous change in the comment above, the memory still seems to continue to bloat. It's as if chunk size of -1 is causing it to hog memory elsewhere. The problem is that if we Valgrind the code, then it just points to some mkl sgemm that is invoked inside libtorch_cpu.so but does not tell which part of wenent code is causing that issue. I saw that someone else also complained that chunk size -1 causes memory bloat, so maybe it's a real issue with the code that has been around for a while. Has anybody ever used it on large scale with chunk size -1? How should the issue be resolved. Also should I change the title of the topic to make it more discoverable?
websocket_server_main
without TLG? Or try to tune the weight of the LM. I think it's easier to figure out this problem.chunk_size
to -1 means the model work in a non-streaming way. We often use that to recognize the short audio.libtorch_cpu.so
manually.I think this is due to vad info is depends on the blank prob.
Here are some results when decoding without the TLG. 1) When chunk_size is set to -1 ---- Runs OOM 2) When chunksize is set to 48. It throws an exception and crashes. The trace is pasted below. Looke like ws.write() is not very happy.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49 49 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt
at /mnt/dsk1/wenet/runtime/server/x86/fc_base/boost-src/boost/throw_exception.hpp:171
at /mnt/dsk1/wenet/runtime/server/x86/fc_base/boost-src/boost/beast/websocket/impl/write.hpp:812
result="[{\"sentence\":\"this is a test sentence to check if wenet does anything\"}]") at /home/ngoel-00171/wenet/runtime/server/x86/websocket/websocket_server.cc:69
__f=@0x7fff18004430: (void (wenet::ConnectionHandler::*)(class wenet::ConnectionHandler * const)) 0x55555560e3b8 <wenet::ConnectionHandler::DecodeThreadFunc()>,
__t=@0x7fff18004428: 0x5555579ab0f8) at /usr/include/c++/10/bits/invoke.h:73
__fn=@0x7fff18004430: (void (wenet::ConnectionHandler::*)(class wenet::ConnectionHandler * const)) 0x55555560e3b8 <wenet::ConnectionHandler::DecodeThreadFunc()>)
at /usr/include/c++/10/bits/invoke.h:95
at /usr/include/c++/10/thread:264
at /usr/include/c++/10/thread:271
this=0x7fff18004420) at /usr/include/c++/10/thread:215
The chunk_size <= 16 maybe a better choice.
I think this is due to vad info is depends on the blank prob.
In ResetContinuousDecoding we should keep the state of some recent frames, not just reset.
@ishine You are right. Could you please commit a pr? Or have you test it?
@ishine You are right. Could you please commit a pr? Or have you test it?
I used offline vad.
A PR will be very helpful. I would volunteer to test.
memory leak has been fixed by jemalloc, see https://github.com/wenet-e2e/wenet/issues/352
I think this is due to vad info is depends on the blank prob.
In ResetContinuousDecoding we should keep the state of some recent frames, not just reset.
I agree, PR is welcome
Hi,I meet the same problem when I'm using wenet_api python binding. If I set continuousdecoding with True, the segment's result before the Endpinting will be clear by `decoder->ResetContinuousDecoding();` at https://github.com/wenet-e2e/wenet/blob/main/runtime/core/api/wenet_api.cc#L143 . If I break after ResetContinuousDecoding, the result before Endpoint will be kept. Maybe you can try it in https://github.com/wenet-e2e/wenet/blob/main/runtime/core/websocket/websocket_server.cc#L138
The modification is like
But I'm not sure if this break will result wave points or features being missed. In my opinion, ResetContinuousDecoding() will not clear the featurepipeline's cache. So the next Decode will use the remained features. Am I right? @xingchensong @pengzhendong
This issue has been automatically closed due to inactivity.
Describe the bug I decoded several files in continuous decoding mode and did a formal WER calculation. To my surprise, the pattern that I see is that half the segments are generally good, and then the other half is "deletion error" in an alternating manner, making to total average error rate above 50%. It appears that the wenet in streaming_mode is missing every alternate segment from the endpointer.
To Reproduce Steps to reproduce the behavior:
Expected behavior Entire transcripts should be returned.
Screenshots Let me know if you fail to replicate. I will give you a real example along with an audio file that you can use with gigaspeech model.
Desktop (please complete the following information):
Smartphone (please complete the following information):
Additional context Add any other context about the problem here.