Open ludwigbald opened 4 years ago
but since good voice models are only run in the cloud by scary big companies
This is totally possible on the client-side, it just becomes much harder across multiple languages or pretty much for anything non-English.
Deepspeech by Mozilla, in theory, can do it (even on live input) but it is quite hard with deepspeech currently as it is mainly trained for english.
Also privacy issues need to be sorted out. (Does the sender allow you to send it in the cloud or not)
In order to make voice messages easily searchable and skimmable, we should have a bot that uses a speech-to-text model to transcribe voice messages.
This could also be a client feature, but since good voice models are only run in the cloud by scary big companies, we should probably not send data there without telling the sender.