Open greed2411 opened 3 years ago
Write an interface and implement GPU batch decoding for Kaldi ASR models in the kaldi-serve core C++ library.
The current partial version (gpu-decoder
branch) is buggy (stale issue here), which you may use as a starting point or write one from scratch, it's upto you. The main idea here is to be able to pass a custom async callback to the batch decoding pipeline that accepts the final result once the GPU compute task is complete.
Relevant links:
Implement an online request queueing mechanism similar to that of ServiceStreamer that utilizes the GPU Batch Decoding interface (Task 1) to reduce latency in the kaldi-serve gRPC server application during higher loads.
possibly along with a request queueing mechanism like ServiceStreamer for online