TTS for multiple languages

Feature request

Maybe I'm just overlooking it, but it would rock if it were possible to do TTS for more languages. English is well catered for with T5, but for other languages I have to fall back to using the browser's built-in TTS, which.. does not sound great.

For example, I was wondering if WhisperSpeech would be an interesting model to support.

"An Open Source text-to-speech system built by inverting Whisper."

https://github.com/collabora/whisperspeech

The examples on Github show it generating French and Polish. It supports Dutch too, which for me personally would be very useful.

https://huggingface.co/WhisperSpeech/WhisperSpeech/tree/main

I found out about it from this Reddit discussion.

Motivation

It sounds great
It would be nice to be able to generate higher quality audio in other languages than English. This would allow for voice conversations (STT -> LLM -> TTS) in many more languages.
Perhaps it being based on Whisper some could could perhaps be re-used? (I haven't the foggiest)
Perhaps is being related to Whisper it will be possible to support WebGPU, similar to how for Whisper WebGPU is already supported?

Your contribution

I am open to suggestions.

xenova / transformers.js