kaldi-asr / kaldi

kaldi-asr/kaldi is the official location of the Kaldi project.
http://kaldi-asr.org
Other
14.21k stars 5.32k forks source link

CUDA online decoder requires exactly frames_per_chunk samples #4684

Open nshmyrev opened 2 years ago

nshmyrev commented 2 years ago

It is not documented or enforced in the API but actually cuda online pipeline requires audio chunk to have exactly GetNSampsPerChunk of audio data. There is no buffering in cuda decoder as in online2 decoder, so if the chunks are smaller, nnet3 computation is done many times degrading speed. Timestamps returned from the decoder also get a shift.

It would be nice to document it and introduce an assert probably.

nshmyrev commented 2 years ago

I implemented the following buffering of the incoming samples to deal with the issue:

https://github.com/alphacep/vosk-api/blob/master/src/batch_recognizer.cc#L216

If needed, we can move such buffering to DynamicBatcher instead.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale by a bot solely because it has not had recent activity. Please add any comment (simply 'ping' is enough) to prevent the issue from being closed for 60 more days if you believe it should be kept open.

danpovey commented 2 years ago

We should keep this issue open, would be nice if someone could create a PR to add an assertion..

kkm000 commented 2 years ago

There are two pipelines. At the bottom level, processing is realtime, so this behavior is rather expected. The batch pipeline actually feeds the realtime pipeline with the data from files. @nshmyrev, which of the two are you talking about?

Offset timestamps are certainly a bug, but processing incomplete batches should have been a supported scenario. The dynamic batcher waits at most so many milliseconds if starving, and then sends the data it has so far through the GPU. @hugovbraun?

ryanwy commented 2 years ago

I tried it, your method will improve the recognition speed, but the accuracy will be greatly reduced.