speechmatics / ctranslate2_triton_backend

Triton backend for https://github.com/OpenNMT/CTranslate2
MIT License
32 stars 4 forks source link

Add support for language models #5

Open HennerM opened 1 year ago

HennerM commented 1 year ago

Previously reported in https://github.com/speechmatics/ctranslate2_triton_backend/issues/2#issuecomment-1546889761 by @aamir-s18

The backend currently only supports encoder-decoder models, whereas the underlying library also has support for decoder-only models: https://github.com/OpenNMT/CTranslate2/blob/master/src/models/language_model.cc

This should be fairly straightforward to add. Ideally we want to auto-detect the type of the model, or alternatively specify in the configuration.

aamir-s18 commented 1 year ago

It would be great to support all the models ctranslate2 supports now (like Whisper and Encoder only).

I considered the best way to abstract the inference since different model classes have other calling functions (like generate, translate ...). We could create a metaclass which takes over the handling of the inference and initialization, so we have a unified interface to talk with. Tbh, I need to find out how easy this is.

Let's specify the model type in the config.