xenova / transformers.js

State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!
https://huggingface.co/docs/transformers.js
Apache License 2.0
10.96k stars 668 forks source link

TTS for multiple languages #898

Open flatsiedatsie opened 3 weeks ago

flatsiedatsie commented 3 weeks ago

Feature request

Maybe I'm just overlooking it, but it would rock if it were possible to do TTS for more languages. English is well catered for with T5, but for other languages I have to fall back to using the browser's built-in TTS, which.. does not sound great.

For example, I was wondering if WhisperSpeech would be an interesting model to support.

"An Open Source text-to-speech system built by inverting Whisper."

https://github.com/collabora/whisperspeech

The examples on Github show it generating French and Polish. It supports Dutch too, which for me personally would be very useful.

https://huggingface.co/WhisperSpeech/WhisperSpeech/tree/main

I found out about it from this Reddit discussion.

Motivation

Your contribution

I am open to suggestions.

flatsiedatsie commented 2 weeks ago

Small update: I've been trying to get it work by creating an ONNX file using optimum. This has proven tricky as there is no config.json file. But maybe that can be generated with the python version of Transformers.js.

According to the authors of WhisperSpeech the model is more similar to MusicGen.

But I've also learnt that Bark is a popular open source TTS that might be useful. It supports a lot of languages too.

https://github.com/suno-ai/bark

// Too bad, it seems to require at least 8GB of ram, and ideally 16..