Open flatsiedatsie opened 3 weeks ago
Small update: I've been trying to get it work by creating an ONNX file using optimum. This has proven tricky as there is no config.json file. But maybe that can be generated with the python version of Transformers.js.
According to the authors of WhisperSpeech the model is more similar to MusicGen.
But I've also learnt that Bark
is a popular open source TTS that might be useful. It supports a lot of languages too.
https://github.com/suno-ai/bark
// Too bad, it seems to require at least 8GB of ram, and ideally 16..
Feature request
Maybe I'm just overlooking it, but it would rock if it were possible to do TTS for more languages. English is well catered for with
T5
, but for other languages I have to fall back to using the browser's built-in TTS, which.. does not sound great.For example, I was wondering if WhisperSpeech would be an interesting model to support.
"An Open Source text-to-speech system built by inverting Whisper."
https://github.com/collabora/whisperspeech
The examples on Github show it generating French and Polish. It supports Dutch too, which for me personally would be very useful.
https://huggingface.co/WhisperSpeech/WhisperSpeech/tree/main
I found out about it from this Reddit discussion.
Motivation
Your contribution
I am open to suggestions.