k2-fsa / sherpa

Speech-to-text server framework with next-gen Kaldi
https://k2-fsa.github.io/sherpa
Apache License 2.0
534 stars 107 forks source link

[Ready] Whisper large triton support #471

Closed yuekaizhang closed 1 year ago

yuekaizhang commented 1 year ago

Support whisper via onnx fp16 using triton.

Some perf results attached here:

Decoding on a single V100 GPU, audios are padding to 30s, using aishell1 test set files

Model Backend Concurrency RTF
Large-v2 ONNX FP16 4 0.14
Module Time Distribution
feature_extractor 0.8%
encoder 9.6%
decoder 67.4%
greedy search 22.2%
yuekaizhang commented 1 year ago

@csukuangfj Would you mind checking this PR when you are free, many thanks!

yuekaizhang commented 1 year ago

Thanks! Left some minor comments.

Thanks, done!