Open nshmyrev opened 2 years ago
I implemented the following buffering of the incoming samples to deal with the issue:
https://github.com/alphacep/vosk-api/blob/master/src/batch_recognizer.cc#L216
If needed, we can move such buffering to DynamicBatcher instead.
This issue has been automatically marked as stale by a bot solely because it has not had recent activity. Please add any comment (simply 'ping' is enough) to prevent the issue from being closed for 60 more days if you believe it should be kept open.
We should keep this issue open, would be nice if someone could create a PR to add an assertion..
There are two pipelines. At the bottom level, processing is realtime, so this behavior is rather expected. The batch pipeline actually feeds the realtime pipeline with the data from files. @nshmyrev, which of the two are you talking about?
Offset timestamps are certainly a bug, but processing incomplete batches should have been a supported scenario. The dynamic batcher waits at most so many milliseconds if starving, and then sends the data it has so far through the GPU. @hugovbraun?
I tried it, your method will improve the recognition speed, but the accuracy will be greatly reduced.
It is not documented or enforced in the API but actually cuda online pipeline requires audio chunk to have exactly GetNSampsPerChunk of audio data. There is no buffering in cuda decoder as in online2 decoder, so if the chunks are smaller, nnet3 computation is done many times degrading speed. Timestamps returned from the decoder also get a shift.
It would be nice to document it and introduce an assert probably.