Open a-meno opened 4 days ago
I believe I have identified the issue within the English lip-sync module, specifically with the following line of code:
.normalize('NFD').replace(/[\u0300-\u036f]/g, '').normalize('NFC') // Removes non-English diacritics
Removing this line appears to resolve the problem, but it might be more appropriate to develop a separate lip-sync module tailored for the Italian language. Would that be a more effective solution?
There is already an open PR for Italian lip-sync. Currently, it doesn't seem to handle diacritics, but perhaps @lupettohf can provide more insight on this.
I don't speak Italian, but I would assume that diacritics marking stress could be ignored, but those that affect vowel openness/closedness should probably be handled to get the best result.
Out of curiosity: I know Italian and Finnish are both phonetically orthographic languages. Have you tried using Finnish lip-sync module "fi"? While there are, of course, differences between the two languages, I would expect Finnish and Italian to share more in common than Italian and English.
It's a Google TTS issue... Not happening when using Microsoft voices or Elevenlabs, I still need to find a solution.
The Finnish lipsync module could potentially be better in my case, and I plan to test it further. However, it currently has the same issue with handling accents and diacritical marks due to the following line of code:
.normalize('NFD').replace(/[\u0300-\u0307\u0309\u030b-\u036f]/g, '').normalize('NFC') // Remove non-Finnish diacritics
This line removes any letters with accents or marks, which poses a problem for languages like Italian, where such diacritics are crucial. In Italian, accents can completely alter the meaning of a word. For example, "però" (/peˈrɔ/) means "however," while "pero" (/ˈpero/) means "pear tree." These words are spelled with the same letters but have different meanings due to the presence of an accent. There are many similar cases in Italian, highlighting the importance of preserving these diacritical marks.
It's a Google TTS issue... Not happening when using Microsoft voices or Elevenlabs, I still need to find a solution.
Hi @lupettohf ! I just tried to change this line:
.normalize('NFD').replace(/[\u0300-\u0307\u0309\u030b-\u036f]/g, '').normalize('NFC')
with this one
.normalize('NFD').normalize('NFC')
and seems to work fine even with Google TTS!
Ps. I am also using locally your italian lip sync module, thank you!
Thanks for the heads up, I will check it out today.
Yes, you are right. In this case you should not remove all diacritics in the preProcessText
method, as the preprocessed results are sent to Google TTS, and removing them would affect pronunciation in Italian. Good catch!
However, note that if you keep the diacritics, you must handle them in the wordToVisemes
method. This can be done either by removing the diacritics before actual processing (so that É is actually handled as E) or, preferably, by adding separate rulesets/rules for letters with diacritics (that is, if you keep É, you must add a ruleset specifically for "É" etc). Otherwise the viseme sequence will be incorrect.
Ok, then I will just add this line to the code:
wordsToVisemes(w) {
let wprocessed = w.replace(/[\u0300-\u036f]/g, '')
let o = { words: wprocessed.toUpperCase(), visemes: [], times: [], durations: [], i: 0 };
...
is that right?
You need to wrap the replace
with normalize('NFD')
and normalize('NFC')
as it was done in the original preProcessText
method. Without this canonical decomposition/composition, it doesn't filter out the diacritics. Otherwise your change seems syntactically correct.
Unfortunately, I don't know enough about Italian phonology to say whether it will produce the best result. As you probably know, there are far more phonemes than visemes, so it is common for different pronunciations to map to the same lip shape (this is essentially what makes lip-reading difficult). Whether this is the case here, with diacritics in Italian, I don't know. My guess was/is that if a diacritic mark indicates stress, it probably doesn't affect the viseme. However, for some vowels, also the lip shape might change. If that is the case, it would be better not to filter out diacritics but instead add letters with diacritics to the conversion rules.
Hi @met4citizen,
I absolutely love what you're doing here! :)
I'm currently experimenting with Google TTS for Italian text and voice. Although there's no official lip-sync module for Italian yet, I've found that the English module still delivers decent results.
This is my setup right now:
However, I've encountered an issue that I'd like to discuss with you. When using the Google TTS API (https://eu-texttospeech.googleapis.com/v1beta1/text:synthesize), the generated voice doesn't correctly capture the intonation for Italian words with accented letters like à, è, é, ì, ò, and ù. This affects words such as "sanità," "aiuterà," "perché," and "più", etc...
Is there anything that can be done to address this issue?