Open AudranBert opened 1 week ago
Could you clarify the list of supported languages? For example, does it include "en," "fr," etc.? On the LinTO side, we consistently use BCP-47 codes for language representation. Parsers (env, API directives...) shall at least support BCP-47 codes as inputs.
Could you clarify the list of supported languages? For example, does it include "en," "fr," etc.? On the LinTO side, we consistently use BCP-47 codes for language representation. Parsers (env, API directives...) shall at least support BCP-47 codes as inputs.
That did not changes in this PR. several formats are supported : "fr" and "fr-FR". This holds for the whole LinTO speech toolkit.
Supported languages are listed here : https://github.com/linto-ai/linto-stt/blob/master/whisper/README.md#language
Also if the user gives a wrong one, it will give an explicit message with the list of possible ones (in the format "fr").
Why this question ? Do you think something is missing in the code or the documentation ?
I haven’t reviewed the code and relied on the doc:
The PR focuses on streaming (?), but what about Celery (task) and HTTP service modes? Are specification updates planned for these?
The PR was created to fix the selection language in streaming, but I added the possibility to send the language through the config for streaming and offline (http and task). That's why I linked this PR to the issue #53
The docs mention "two or three-letter codes" for languages but not BCP-47 tags—should this be clarified?
It should work with tags like "fr-FR" because it will split on the "-" and keep the first part (here "fr") and use that as language.
- The docs mention "two or three-letter codes" for languages but not BCP-47 tags—should this be clarified?
Yes we should mention that they are supported, but that the second part ("FR" in "fr-FR") is ignored (results of the model are invariant to this)
- The PR focuses on streaming (?), but what about Celery (task) and HTTP service modes? Are specification updates planned for these?
Yes. The PR is not finished yet ("WIP" in the title)
- For Celery, should we open an issue in https://github.com/linto-ai/linto-transcription to handle the target language correctly?
Yes. There will be an issue with that feature request. Worst case I will make it when I will commit related things (mentioning the issue in the commit message : we discussed to use this as much as possible). (our plan is to split the work : Audran here on core stt / me on transcription service API evolution)
Add language selection for streaming with whisper, by default it will take the language found in the env settings. But you can pass a language in the config when starting streaming.
It also adds the possibility to pass a language in the config in case of offline decoding as requested in #53 . It will enable having a same model instance used for multiple languages instead of launching another Docker.
The PR is also improving tests to add tests about languages. Also removing some useless ones in order to reduce testing duration.