skit-ai / kaldi-serve

Server framework for Kaldi ASR Toolkit
Apache License 2.0
97 stars 24 forks source link

GPU batch decoding + Online request queueing machanism #30

Open greed2411 opened 2 years ago

greed2411 commented 2 years ago

possibly along with a request queueing mechanism like ServiceStreamer for online

pskrunner14 commented 2 years ago

Task 1

Write an interface and implement GPU batch decoding for Kaldi ASR models in the kaldi-serve core C++ library.

The current partial version (gpu-decoder branch) is buggy (stale issue here), which you may use as a starting point or write one from scratch, it's upto you. The main idea here is to be able to pass a custom async callback to the batch decoding pipeline that accepts the final result once the GPU compute task is complete.

Relevant links:

  1. Batched Decoding binary
  2. Batched Threaded CUDA Pipeline - Source

Task 2

Implement an online request queueing mechanism similar to that of ServiceStreamer that utilizes the GPU Batch Decoding interface (Task 1) to reduce latency in the kaldi-serve gRPC server application during higher loads.