snakers4 / silero-models

Silero Models: pre-trained speech-to-text, text-to-speech and text-enhancement models made embarrassingly simple
Other
4.96k stars 312 forks source link

Feature request - `<phoneme>` support for SSML #146

Open lagleki opened 2 years ago

lagleki commented 2 years ago

🚀 Feature

Allow phonetic pronunciation for necessary words

Motivation

Sometimes it's necessary to customize pronunciation of words with non-standard spelling or word borrowed from other languages. In that case having transcription in IPA or X-SAMPA would be nice (see e.g. Polly for explanation of the syntax)

Pitch

Wrapping IPA or X-SAMPA transcription into a <phoneme> tag makes the engine pronounce the word according to its specification.

Alternatives

Not sure if there are any within the project. Using other projects supporting <phoneme> is possible.

Additional context

snakers4 commented 2 years ago

This is a nice feature to have, but probably in semi-distant future

MulleDK19 commented 11 months ago

This would be really useful, especially because the model mispronounces a lot of words, such as (pronounciation in parentheses): segue, one-time, tap-in, soccer (such-er), one-on-one, lineup, deviates, Thomas (Thumb as), diving (dee-ving), AI (ey), rewind (re-wind like the air), Danish (Dar-nish), mishap (me shap), mishit (me shit), and a lot more.

I'm getting by, by replacing these words by alternative spellings, but it's not ideal, and it's not easy.