karashiiro / TextToTalk

Chat TTS plugin for Dalamud. Has support for triggers/exclusions, several TTS providers, and more!
MIT License
47 stars 30 forks source link

.NET SpeechSynthesizer bugs #37

Closed johnysandels closed 2 years ago

johnysandels commented 3 years ago

-with custom lexicon, sometimes voice gets stuck on one voice, and changing the voice crashes the game.


Listed are issues that are fixed by deleting and reinstalling the plugin, or re-starting the game - Haven't had time to see if these have been fixed in 1.8.4.0 :

FIXED -when using <proneme> in the lexicon.xml TTS can't read anything out loud at all. Pronemes are working with some bugs(not tested for all the bugs yet)

FIXED -also seems like European voices don't use the lexicon pronouncation when xml:lang="en' only when it's set to xml:lang="en-GB"

FIXED - first time selecting a lexicon file, pronunciation isn't used.

FIXED - deleting and re-selecting lexicons to use updated version of same lexicon.xml file won't use new pronunciation.

johnysandels commented 3 years ago

Updated list with more issues

karashiiro commented 3 years ago

These all seem to be bugs and design choices in the .NET speech synthesizer. I'm looking into workarounds, probably going to do something hacky like this: https://stackoverflow.com/a/59035660/14226597. In the meantime, I'm pushing this to release with a disclaimer.

karashiiro commented 3 years ago

New: Renaming the file crashes the game.

johnysandels commented 3 years ago

Here is are two lexicons files using pronemes for testing purposes. FFXIVen alters two game names (Alphinaud & Urianger) to their proper pronunciation. The second one is a test lexicon from the lexicon Wikipedia which I've altered, the words test and text should be pronounced as apple. The word dog should pronounced as cat. Both files work in Amazon Polly and also work in the plugin when using Amazon Polly as a backend.

Also both lexicon files work in all English languages in both the Amazon Polly website and also as a backend in the plugin. (tested for British and US) xml:lang="en" allows it to work for any english language, which doesn't currently work with the system backend, you must pick a region.

proneme test.zip

karashiiro commented 3 years ago

Leaving this here for my own reference: https://stackoverflow.com/a/642314

karashiiro commented 3 years ago

I think it's fair at this point to say that the native backend sucks, but I think I might've fixed some of these issues in https://github.com/goatcorp/DalamudPlugins/pull/1012.

karashiiro commented 3 years ago

Even with this, using the phoneme element just breaks things randomly.

johnysandels commented 3 years ago

Found as issue with how the standard plugin receives ipa formatting. Reading pronemes breaks TTS and makes it not say anything when using a regular colon,:. You need to use this special colon ː to make it work currently. Using spaces in pronemes also currently breaks the TTS and makes it not say anything. So avoiding spaces and using that special colon as a work around seems to be make Pronemes work after update 1.8.4.0.

Spaces and regular colons work in Amazon Polly currently. If only we could use Amazon Polly's backend with the system voices 🙃

karashiiro commented 3 years ago

Updated the PR with replacements for those two issues.

johnysandels commented 3 years ago

Just caught another one, I didn't think you'd be so quick! This Symbol - also should be working but currently isn't.

Update: Confirming all 3 symbols work now

johnysandels commented 3 years ago

Seems like there's an issue with using the grapheme Y'shtola. Using a proneme with it causes silence, using an alias with it pronounces it ignores the lexicon and uses default pronunciation.

The pronunciation for the word works if triggered with a different word (I used Yshtola with no apostrophe for testing and the pronunciation worked. So I can confirm it's not a problem with IPA symbols in pronemes).

I remember Y'stola worked with <alias> in the past, So something must have interacted with that somehow since then, since it doesn't work with alias anymore. I assume once it works with alias again, it will work it pronemes as well.

karashiiro commented 3 years ago

Can you send a copy of the lexicon you're using for that? The main issue is that I'm reimplementing lexicon processing myself, and I only implemented the grapheme and phoneme nodes. If you have an alias node in your lexicon, I need to know how it's formatted so I can parse it out and use it.

And I'm guessing by alias you mean sub alias, which is the only similar thing here: https://cloud.google.com/text-to-speech/docs/ssml#sub

johnysandels commented 3 years ago

Lexicon Alias.zip This uses Aliases for pronunciation. basically just trying to force pronunciation by using words.

I mainly used this Wikipedia Article for help with lexicon xml formatting https://en.wikipedia.org/wiki/Pronunciation_Lexicon_Specification
Lexicon Proneme.zip

Until recently Aliases were working to force pronunciation of words including everything in that first file. I actually just realized they're no longer working. Everything on the proneme file works now perfectly, except for the Grapheme Y'shtola because of the apostrophe bugging it out, which was working fine on that first file earlier.

That being said Aliases aren't really that important now that pronemes are working.

karashiiro commented 3 years ago

Should be fixed in https://github.com/goatcorp/DalamudPlugins/pull/1015

johnysandels commented 3 years ago

Just tried it out and it seems like my proneme lexicon file doesn't work at all anymore. So I can't confirm if Y'shtola issue has been fixed. Aliases do work now though.

I did notice that in the notes for goatcorp/DalamudPlugins#1015, you mentioned "Handles apostrophes and double quotes in phonemes" So maybe that's where the issue stems from? Because the issue was with the grapheme not the proneme, taking the apostrophe out of the grapheme made the pronunciation work without any altering of the proneme.

karashiiro commented 3 years ago

That was actually an unrelated fix to handle phoneme tags like this: <phoneme alphabet="x-sampa" ph='m@"hA:g@%ni:'>mahogany</phoneme> which have a double quote in the middle, but I reverted that and updated the PR so we can see if that caused issues.

johnysandels commented 3 years ago

Yeah I just tested it and nothing has seemed to change. Alias still works, and pronemes still don't. So I was wrong in my correction, sorry about the confusion!

karashiiro commented 3 years ago

Out of curiosity, does it work if you remove the lexeme with the 's grapheme? I think it's applying both that and the Y'shtola grapheme at the same time, so it ends up becoming

<phoneme ph="jiʃtoʊla">Y<phoneme ph="z">htola</phoneme></phoneme>

I don't know if that causes problems or not, but it might?

johnysandels commented 3 years ago

Yeah Using a config without 's makes Y'shtola work in 1.8.4.0! To clarify, in 1.8.5.0 all pronemes don't work at all.

Is there a way to make both entries work together? Since having a 's on a custom pronunciation sounds kind of awkward sometimes.

karashiiro commented 3 years ago

How about now, I removed support for the alphabet attribute. That's the only other phoneme-related change I made, so maybe that's the issue?

I'm trying to make it work but it's less simple to keep track of where replacements have already occurred.

johnysandels commented 3 years ago

That fixed it 👍

Is the text being parsed multiple times, or is just being processed once? I imagine some sort of priority system could help. I'll try changing the .xml order to see if that changes anything rq. No dice.

karashiiro commented 3 years ago

The input text just has replacements run over it once for each grapheme-phoneme pair. I might've implemented a fix, updated PR.

johnysandels commented 3 years ago

image TTS either doesn't work or it crashes the game No sound for Y'shtola, G'raha crashes the game, Urianger works.

karashiiro commented 3 years ago

Random guess, apostrophes are reserved in the SSML specification according to the Polly documentation (but it should be generally true), which is what's actually causing the issue. This also makes text containing & without being an escape sequence failing to generate speech. PR updated, and try using &apos; instead of ' in the lexicon when you try it.

johnysandels commented 3 years ago

I am getting a that same crash from my last comment seemingly randomly even without using apostrophes in the proneme. it seems like all of the in game text with an apostrophes in them don't use the lexicon anymore.

I think it's related to your earlier comment:

The input text just has replacements run over it once for each grapheme-phoneme pair. I might've implemented a fix, updated PR.

No crash before that, and lexicons worked except for conflicting entries.

karashiiro commented 3 years ago

Undid that and handling SSML reserved tokens very differently now, it's messy but I think this should work correctly now.

johnysandels commented 3 years ago

Not boding well so far. the very first one I tested crashed the game. I typed /echo G'raha This is my code for that lexeme

    <lexeme>
        <grapheme>G'raha</grapheme>
        <phoneme>grɑhɑ</phoneme>
    </lexeme>

I've also tried it like this:

    <lexeme>
        <grapheme>G&apos;raha</grapheme>
        <phoneme>grɑhɑ</phoneme>
    </lexeme>

Is this how it is formatted in my lexicon file.

Some of the lexemes are working and some crash the game. I know Y'shtola is working, but Alisaie crashed the game too. I've been using test.zip, if you want to look for any errrors. I tried using &apos; in the grapheme and phoneme spot, and it's the same no matter what.

karashiiro commented 3 years ago

Alright, I reverted it to how it was when it was working (I think), I'll try digging deeper into this tomorrow.

johnysandels commented 3 years ago

Also this was in the dalamud.log

2021-08-17 23:26:01.639 -04:00 [FTL] Unhandled exception on AppDomain
System.ArgumentOutOfRangeException: startIndex cannot be larger than length of string.
Parameter name: startIndex
   at System.String.Substring(Int32 startIndex, Int32 length)
   at TextToTalk.Backends.System.LexiconManager.MakeSsml(String text, String langCode)
   at TextToTalk.Backends.System.SystemSoundQueue.OnSoundLoop(SystemSoundQueueItem nextItem)
   at TextToTalk.Backends.SoundQueue`1.PlaySoundLoop()
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
   at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.ThreadHelper.ThreadStart()


Sorry for not thinking to look at my log 😅

karashiiro commented 3 years ago

Oh, duh, that explains a lot. One minute, I'm deploying a fix for that.

karashiiro commented 3 years ago

Actually, nevermind, that throws a major wrench into how I was doing it, I need to rethink how I'm preventing nested replacements again. I'll leave it at how it was before I tried fixing it for now.

johnysandels commented 2 years ago

All of the original bugs I've listed have been resolved here, so we could close this and open a new issue for just the nested replacements. Looks like a lot when you scroll through it 😅