GPU batch decoding + Online request queueing machanism

Task 1

Write an interface and implement GPU batch decoding for Kaldi ASR models in the kaldi-serve core C++ library.

The current partial version (gpu-decoder branch) is buggy (stale issue here), which you may use as a starting point or write one from scratch, it's upto you. The main idea here is to be able to pass a custom async callback to the batch decoding pipeline that accepts the final result once the GPU compute task is complete.

Relevant links:

Batched Decoding binary
Batched Threaded CUDA Pipeline - Source

Task 2

Implement an online request queueing mechanism similar to that of ServiceStreamer that utilizes the GPU Batch Decoding interface (Task 1) to reduce latency in the kaldi-serve gRPC server application during higher loads.

skit-ai / kaldi-serve

GPU batch decoding + Online request queueing machanism #30

Task 1

Task 2