(PLS) Longer graphemes do not take precedence

ogelsan commented 2 years ago

This may be related to #46, but it seemed different enough to open a new issue.

When matching entries in a lexicon file (for the System backend, at least), TTT appears to give priority to shorter grapheme matches. This prevents longer matches from working entirely.

E.g., given two entries, one with <grapheme>Ixal</grapheme> and the other with <grapheme>Ixali</grapheme>, when given the string "Ixali", TTT matches the first and ignores the second. The backend then sees the result as Ixal and i, and pronounces it as ['ɪk.sɑːl 'aɪ].

This is true regardless of the order of lexemes in the lexicon file. It seems like LexiconManager sorts the entries and then applies them in the order of shortest to longest.

FWIW, this seems to run counter to a couple guidelines in the PLS specification at https://www.w3.org/TR/pronunciation-lexicon/#AppC:

Precedence should be given to the retrieval of lexemes having a <grapheme> element whose content exactly matches the longest possible sequence of consecutive tokens. Thus, a lexeme for "they'll" should have precedence over a lexeme for "they" given the input "they'll'.

Lexical retrieval should be performed by the bias of tokens rather than characters. Thus, a lexeme for "do" should not match the beginning of "done".

The current implementation in LexiconManager doesn't appear to bother with any tokenization at the moment, so that might be worth pursuing. How exactly tokenization is typically implemented for speech synthesis is a bit beyond my depth, though.

Either way, I've found a workaround with the current version. Because aliases are applied before phonemes, you can use an alias to replace the longer grapheme with a string that doesn't match the shorter one, and then create a separate lexeme that matches that alias to the correct phoneme, like so:

  <lexeme>
    <grapheme>Ixali</grapheme>
    <grapheme>ixali</grapheme>
    <alias>I_x_a_l_i</alias>
  </lexeme>
  <lexeme>
    <grapheme>I_x_a_l_i</grapheme>
    <phoneme>ɪkˈsɑːli</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>Ixal</grapheme>
    <grapheme>ixal</grapheme>
    <phoneme>ɪksɑːl</phoneme>
  </lexeme>

This is super hacky, though.

johnysandels commented 2 years ago

You're right, that is very similar to the other issue. I wonder if that would be easier for Karashiiro to implement over some sort of nesting system. I know there was a bit of a road block on what the right solution would be.

Also the reason it doesn't follow the normal protocol is because Kara ended up needing to make their own backend, because the native one didn't work well.

ogelsan commented 2 years ago

Oh, yeah. I only started using this plugin the other day, and I hadn't messed with PLS files before, but I'd gleaned from poking around that it was a quick, recent implementation. I just figured that, in the long run, they probably wanted it more or less match how it's expected to work in other backends, so I included that bit for reference.

And I'm not sure I fully get what you mean by nesting, but the specific example you had was distinguishing Y'shtola vs. the suffix 's, right? Like I said, I don't really know exactly how TTS systems tend to handle text on this level, but I get the sense that it basically involves first tokenizing the string against a lexicon, and then tokenizing whatever didn't match across typical boundaries like spaces and punctuation.

So, let's say you have a lexicon including the entries "Y'shtola", "Sylphie", and "Sylph", and then you have the sentence "Y'shtola's talking to Sylphie." You'd start by matching Y'shtola and Sylphie (and "Sylphie" wins over "Sylph", because it's a longer match). This leaves you with 's talking to unmatched. Now you'd break that up, leading to the final series of tokens: Y'shtola 's talking to Sylphie. Since the longer match is made first, and since once something's matched it's "locked in", you don't need to worry about something like 's breaking up a larger word.

Hmm, now that I think about it, the backend seems to be seeing extra boundaries at the edges of each match... Like, if you match Chocobo in "Chocobos" or "Chocobo's", it'll read the final s as an isolated letter, [ɛs]. If there's no match, even if it's a strange word, the backend'll usually try to pronounce it as a part of the whole word. I don't know if it's possible to fix that in any reasonable way, or if it's easier to just work around it. My gut's saying it's probably the latter.

karashiiro commented 2 years ago

I just put up a testing build https://github.com/goatcorp/DalamudPlugins/pull/1452, let me know if it works 👍

johnysandels commented 2 years ago

New update crashed on the first one I tried

2021-11-08 21:42:28.804 -08:00 [FTL] Unhandled exception on AppDomain
System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at TextToTalk.Backends.System.LexiconManager.ReplacePhoneme(String text, String oldValue, String newValue) in K:\arashiiro\Backends\System\LexiconManager.cs:line 136
   at TextToTalk.Backends.System.LexiconManager.MakeSsml(String text, String langCode) in K:\arashiiro\Backends\System\LexiconManager.cs:line 124
   at TextToTalk.Backends.System.SystemSoundQueue.OnSoundLoop(SystemSoundQueueItem nextItem) in K:\arashiiro\Backends\System\SystemSoundQueue.cs:line 41
   at TextToTalk.Backends.SoundQueue`1.PlaySoundLoop() in K:\arashiiro\Backends\SoundQueue.cs:line 37
   at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
   at System.Threading.ThreadHelper.ThreadStart()
2021-11-08 21:42:28.822 -08:00 [INF] LASTEXCEPTION:eyJXaGVuIjoiMjAyMS0xMS0wOFQyMTo0MjoyOC44MTkzMjg5LTA4OjAwIiwiSW5mbyI6IlN5c3RlbS5JbmRleE91dE9mUmFuZ2VFeGNlcHRpb246IEluZGV4IHdhcyBvdXRzaWRlIHRoZSBib3VuZHMgb2YgdGhlIGFycmF5LlxyXG4gICBhdCBUZXh0VG9UYWxrLkJhY2tlbmRzLlN5c3RlbS5MZXhpY29uTWFuYWdlci5SZXBsYWNlUGhvbmVtZShTdHJpbmcgdGV4dCwgU3RyaW5nIG9sZFZhbHVlLCBTdHJpbmcgbmV3VmFsdWUpIGluIEs6XFxhcmFzaGlpcm9cXEJhY2tlbmRzXFxTeXN0ZW1cXExleGljb25NYW5hZ2VyLmNzOmxpbmUgMTM2XHJcbiAgIGF0IFRleHRUb1RhbGsuQmFja2VuZHMuU3lzdGVtLkxleGljb25NYW5hZ2VyLk1ha2VTc21sKFN0cmluZyB0ZXh0LCBTdHJpbmcgbGFuZ0NvZGUpIGluIEs6XFxhcmFzaGlpcm9cXEJhY2tlbmRzXFxTeXN0ZW1cXExleGljb25NYW5hZ2VyLmNzOmxpbmUgMTI0XHJcbiAgIGF0IFRleHRUb1RhbGsuQmFja2VuZHMuU3lzdGVtLlN5c3RlbVNvdW5kUXVldWUuT25Tb3VuZExvb3AoU3lzdGVtU291bmRRdWV1ZUl0ZW0gbmV4dEl0ZW0pIGluIEs6XFxhcmFzaGlpcm9cXEJhY2tlbmRzXFxTeXN0ZW1cXFN5c3RlbVNvdW5kUXVldWUuY3M6bGluZSA0MVxyXG4gICBhdCBUZXh0VG9UYWxrLkJhY2tlbmRzLlNvdW5kUXVldWVgMS5QbGF5U291bmRMb29wKCkgaW4gSzpcXGFyYXNoaWlyb1xcQmFja2VuZHNcXFNvdW5kUXVldWUuY3M6bGluZSAzN1xyXG4gICBhdCBTeXN0ZW0uVGhyZWFkaW5nLlRocmVhZEhlbHBlci5UaHJlYWRTdGFydF9Db250ZXh0KE9iamVjdCBzdGF0ZSlcclxuICAgYXQgU3lzdGVtLlRocmVhZGluZy5FeGVjdXRpb25Db250ZXh0LlJ1bkludGVybmFsKEV4ZWN1dGlvbkNvbnRleHQgZXhlY3V0aW9uQ29udGV4dCwgQ29udGV4dENhbGxiYWNrIGNhbGxiYWNrLCBPYmplY3Qgc3RhdGUpXHJcbi0tLSBFbmQgb2Ygc3RhY2sgdHJhY2UgZnJvbSBwcmV2aW91cyBsb2NhdGlvbiAtLS1cclxuICAgYXQgU3lzdGVtLlRocmVhZGluZy5UaHJlYWRIZWxwZXIuVGhyZWFkU3RhcnQoKSIsIkNvbnRleHQiOiJVbmhhbmRsZWQgZXhjZXB0aW9uIG9uIEFwcERvbWFpblxuU3lzdGVtLkluZGV4T3V0T2ZSYW5nZUV4Y2VwdGlvbjogSW5kZXggd2FzIG91dHNpZGUgdGhlIGJvdW5kcyBvZiB0aGUgYXJyYXkuXHJcbiAgIGF0IFRleHRUb1RhbGsuQmFja2VuZHMuU3lzdGVtLkxleGljb25NYW5hZ2VyLlJlcGxhY2VQaG9uZW1lKFN0cmluZyB0ZXh0LCBTdHJpbmcgb2xkVmFsdWUsIFN0cmluZyBuZXdWYWx1ZSkgaW4gSzpcXGFyYXNoaWlyb1xcQmFja2VuZHNcXFN5c3RlbVxcTGV4aWNvbk1hbmFnZXIuY3M6bGluZSAxMzZcclxuICAgYXQgVGV4dFRvVGFsay5CYWNrZW5kcy5TeXN0ZW0uTGV4aWNvbk1hbmFnZXIuTWFrZVNzbWwoU3RyaW5nIHRleHQsIFN0cmluZyBsYW5nQ29kZSkgaW4gSzpcXGFyYXNoaWlyb1xcQmFja2VuZHNcXFN5c3RlbVxcTGV4aWNvbk1hbmFnZXIuY3M6bGluZSAxMjRcclxuICAgYXQgVGV4dFRvVGFsay5CYWNrZW5kcy5TeXN0ZW0uU3lzdGVtU291bmRRdWV1ZS5PblNvdW5kTG9vcChTeXN0ZW1Tb3VuZFF1ZXVlSXRlbSBuZXh0SXRlbSkgaW4gSzpcXGFyYXNoaWlyb1xcQmFja2VuZHNcXFN5c3RlbVxcU3lzdGVtU291bmRRdWV1ZS5jczpsaW5lIDQxXHJcbiAgIGF0IFRleHRUb1RhbGsuQmFja2VuZHMuU291bmRRdWV1ZWAxLlBsYXlTb3VuZExvb3AoKSBpbiBLOlxcYXJhc2hpaXJvXFxCYWNrZW5kc1xcU291bmRRdWV1ZS5jczpsaW5lIDM3XHJcbiAgIGF0IFN5c3RlbS5UaHJlYWRpbmcuVGhyZWFkSGVscGVyLlRocmVhZFN0YXJ0X0NvbnRleHQoT2JqZWN0IHN0YXRlKVxyXG4gICBhdCBTeXN0ZW0uVGhyZWFkaW5nLkV4ZWN1dGlvbkNvbnRleHQuUnVuSW50ZXJuYWwoRXhlY3V0aW9uQ29udGV4dCBleGVjdXRpb25Db250ZXh0LCBDb250ZXh0Q2FsbGJhY2sgY2FsbGJhY2ssIE9iamVjdCBzdGF0ZSlcclxuLS0tIEVuZCBvZiBzdGFjayB0cmFjZSBmcm9tIHByZXZpb3VzIGxvY2F0aW9uIC0tLVxyXG4gICBhdCBTeXN0ZW0uVGhyZWFkaW5nLlRocmVhZEhlbHBlci5UaHJlYWRTdGFydCgpIn0=
2021-11-08 21:42:28.827 -08:00 [INF] LASTEXCEPTION:eyJXaGVuIjoiMjAyMS0xMS0wOFQyMTo0MjoyOC44Mjc1NzU2LTA4OjAwIiwiSW5mbyI6IlN5c3RlbS5JbmRleE91dE9mUmFuZ2VFeGNlcHRpb246IEluZGV4IHdhcyBvdXRzaWRlIHRoZSBib3VuZHMgb2YgdGhlIGFycmF5LlxyXG4gICBhdCBUZXh0VG9UYWxrLkJhY2tlbmRzLlN5c3RlbS5MZXhpY29uTWFuYWdlci5SZXBsYWNlUGhvbmVtZShTdHJpbmcgdGV4dCwgU3RyaW5nIG9sZFZhbHVlLCBTdHJpbmcgbmV3VmFsdWUpIGluIEs6XFxhcmFzaGlpcm9cXEJhY2tlbmRzXFxTeXN0ZW1cXExleGljb25NYW5hZ2VyLmNzOmxpbmUgMTM2XHJcbiAgIGF0IFRleHRUb1RhbGsuQmFja2VuZHMuU3lzdGVtLkxleGljb25NYW5hZ2VyLk1ha2VTc21sKFN0cmluZyB0ZXh0LCBTdHJpbmcgbGFuZ0NvZGUpIGluIEs6XFxhcmFzaGlpcm9cXEJhY2tlbmRzXFxTeXN0ZW1cXExleGljb25NYW5hZ2VyLmNzOmxpbmUgMTI0XHJcbiAgIGF0IFRleHRUb1RhbGsuQmFja2VuZHMuU3lzdGVtLlN5c3RlbVNvdW5kUXVldWUuT25Tb3VuZExvb3AoU3lzdGVtU291bmRRdWV1ZUl0ZW0gbmV4dEl0ZW0pIGluIEs6XFxhcmFzaGlpcm9cXEJhY2tlbmRzXFxTeXN0ZW1cXFN5c3RlbVNvdW5kUXVldWUuY3M6bGluZSA0MVxyXG4gICBhdCBUZXh0VG9UYWxrLkJhY2tlbmRzLlNvdW5kUXVldWVgMS5QbGF5U291bmRMb29wKCkgaW4gSzpcXGFyYXNoaWlyb1xcQmFja2VuZHNcXFNvdW5kUXVldWUuY3M6bGluZSAzN1xyXG4gICBhdCBTeXN0ZW0uVGhyZWFkaW5nLlRocmVhZEhlbHBlci5UaHJlYWRTdGFydF9Db250ZXh0KE9iamVjdCBzdGF0ZSlcclxuICAgYXQgU3lzdGVtLlRocmVhZGluZy5FeGVjdXRpb25Db250ZXh0LlJ1bkludGVybmFsKEV4ZWN1dGlvbkNvbnRleHQgZXhlY3V0aW9uQ29udGV4dCwgQ29udGV4dENhbGxiYWNrIGNhbGxiYWNrLCBPYmplY3Qgc3RhdGUpXHJcbi0tLSBFbmQgb2Ygc3RhY2sgdHJhY2UgZnJvbSBwcmV2aW91cyBsb2NhdGlvbiAtLS1cclxuICAgYXQgU3lzdGVtLlRocmVhZGluZy5UaHJlYWRIZWxwZXIuVGhyZWFkU3RhcnQoKSIsIkNvbnRleHQiOiJEYWxhbXVkVW5oYW5kbGVkIn0=
2021-11-08 21:42:31.899 -08:00 [INF] User chose to disable plugins on next launch...

tested with /echo Yugiri

<lexeme>
    <grapheme>Yugiri</grapheme>
    <phoneme>ˈju:gɪdi</phoneme>
</lexeme>

Crash window. which also was triggered with Urianger

<lexeme>
    <grapheme>Urianger</grapheme>
    <phoneme>ori.ɒnʒeɪ</phoneme>
</lexeme>

Game crashes with same error when typing the word "hi"

UPDATE: Game doesn't crash when no lexicon is selected.

karashiiro commented 2 years ago

Released testing v1.9.7, how about now? I wasn't checking if the replacement text was present in the game text before doing stuff.

johnysandels commented 2 years ago

plugin doesn't crash! It seems like it does work with making longer grapheme matches have priority! Only problem I've encountered is that having a grapheme with 's makes the pronunciation of Y'shtola use default pronunciation instead of what is listed in the lexicon. Removing the 's entry makes it use the lexicon pronunciation again.

<lexeme>
    <grapheme>Y'shtola</grapheme>
    <phoneme>jiʃtoʊˈlɑ</phoneme>
</lexeme>
<lexeme>
    <grapheme>'s</grapheme>
    <phoneme>z</phoneme>
</lexeme>

nothing to point towards this in the dalamud or output log 😅

Otherwise, longer words are taking priority and being pronounced right. Maybe the ' symbol wasn't accounted for?

karashiiro commented 2 years ago

Just added a sort for graphemes and better logging, how about now? If that doesn't fix it, you should at least be able to see what's happening.

johnysandels commented 2 years ago

Seems like the lexicon pronunciation isn't used at all anymore.

Tested multiple names and they all didn't use the lexicon. all that shows in the log is this. each thing looks like this.

2021-11-10 19:52:44.726 -08:00 [INF] [TextToTalk] Y'shtola
2021-11-10 19:52:49.471 -08:00 [INF] [TextToTalk] Alphinaud
2021-11-10 19:52:55.850 -08:00 [INF] [TextToTalk] Urianger

karashiiro commented 2 years ago

v1.9.9 reverted that sorting thing, so it should be half-functional again.

plugin doesn't crash! It seems like it does work with making longer grapheme matches have priority! Only problem I've encountered is that having a grapheme with 's makes the pronunciation of Y'shtola use default pronunciation instead of what is listed in the lexicon. Removing the 's entry makes it use the lexicon pronunciation again.
snip
nothing to point towards this in the dalamud or output log 😅

Otherwise, longer words are taking priority and being pronounced right. Maybe the ' symbol wasn't accounted for?

I was just attempting to test this, and it seems to work fine? It might be more complex than just the two lexemes interacting with each other. Can you send the whole lexicon file?

I was just using this: test.zip

johnysandels commented 2 years ago

Just checked this morning and it seems to work fine! Everything was working like it was in the earlier version, and Y'shtola is working this time! Not sure what changed to make Y'shtola's work, but it seems to do the trick!

johnysandels commented 2 years ago

While writing a lexeme I found an issue with a longer match not taking priority. Qitaris does not have priority over Qitari.

<lexeme>
<grapheme>Qitari</grapheme>
<phoneme>kɪtɑːriː</phoneme>
</lexeme>
<lexeme>
<grapheme>Qitaris</grapheme>
<phoneme>kɪtɑːriːz</phoneme>
</lexeme>

in the output log it says

2021-12-17 22:17:30.870 -08:00 [INF] [TextToTalk] <phoneme ph="kɪtɑːriː">Qitari</phoneme>s
2021-12-17 22:17:33.751 -08:00 [INF] [TextToTalk] <phoneme ph="kɪtɑːriː">Qitari</phoneme>

Same is happening with Ixal and Ixals

2021-12-17 21:24:30.589 -08:00 [INF] [TextToTalk] <phoneme ph="ɪksɪl">Ixal</phoneme>
2021-12-17 21:25:32.680 -08:00 [INF] [TextToTalk] <phoneme ph="ɪksɪl">Ixal</phoneme>s

lexeme

<lexeme>
<grapheme>Ixal</grapheme>
<phoneme>ɪksɪl</phoneme>
</lexeme>
<lexeme>
<grapheme>Ixals</grapheme>
<phoneme>ɪksɪlz</phoneme>
</lexeme>

johnysandels commented 2 years ago

When Vanu and Vanus in the same sentence it makes the voice not read out, this was in the output log

2021-12-17 22:50:06.152 -08:00 [INF] [TextToTalk] <phoneme ph="vɑːnu">Vanu</phoneme> <phoneme ph="vɑːnuz"><phoneme ph="vɑːnu">Vanu</phoneme>s</phoneme>

and my lexicon entry

<lexeme>
<grapheme>Vanu</grapheme>
<phoneme>vɑːnu</phoneme>
</lexeme>
<lexeme>
<grapheme>Vanus</grapheme>
<phoneme>vɑːnuz</phoneme>
</lexeme>

johnysandels commented 2 years ago

While writing a lexeme I found an issue with a longer match not taking priority. Qitaris does not have priority over Qitari

Found another example, Eorzean does not have priority over Eorzea from the dalamud.log

2021-12-18 23:36:50.693 -08:00 [INF] [TextToTalk] <phoneme ph="eɪɔrːzɪːə">Eorzea</phoneme>
2021-12-18 23:36:52.540 -08:00 [INF] [TextToTalk] <phoneme ph="eɪɔrːzɪːə">Eorzea</phoneme>n

and the lexeme I was using

<lexeme>
<grapheme>Eorzea</grapheme>
<phoneme>eɪ ɔrːzɪː ə</phoneme>
</lexeme>
<lexeme>
<grapheme>Eorzean</grapheme>
<phoneme>eɪ ɔrːzɪːæn</phoneme>
</lexeme>

karashiiro commented 2 years ago

I've released a possible fix for this in v1.9.10, let me know if it works.

johnysandels commented 2 years ago

Seems like this update just causes tts to not use most of the pronemes. It seems like it's using the pronemes for some of them though.

Also using Vanu and Vanus in the same sentence still causes the text to not be read out. probably because of some sort of bug to do with the way graphemes are given priority.

This is from dalamud log.

2021-12-19 21:41:36.030 -08:00 [INF] [TextToTalk] <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">Amalj'aa</speak>
2021-12-19 21:42:02.243 -08:00 [INF] [TextToTalk] <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">Amalj'aas</speak>
2021-12-19 21:42:49.663 -08:00 [INF] [TextToTalk] <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">Alphinaud</speak>
2021-12-19 21:42:51.425 -08:00 [INF] [TextToTalk] <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">Alphinaud</speak>
2021-12-19 21:42:59.512 -08:00 [INF] [TextToTalk] <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">G'raha</speak>
2021-12-19 21:43:01.679 -08:00 [INF] [TextToTalk] <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">G'raha</speak>
2021-12-19 21:46:16.840 -08:00 [INF] [TextToTalk] <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US"><phoneme ph="vɑːnu">Vanu</phoneme></speak>
2021-12-19 21:46:18.679 -08:00 [INF] [TextToTalk] <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US"><phoneme ph="vɑːnuz">Vanus</phoneme></speak>
2021-12-19 21:46:29.715 -08:00 [INF] [TextToTalk] <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US"><phoneme ph="vɑːnu">Vanu</phoneme> <phoneme ph="vɑːnuz"><phoneme ph="vɑːnu">Vanu</phoneme>s</phoneme></speak>
2021-12-19 21:46:39.601 -08:00 [INF] [TextToTalk] <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">Urianger</speak>

karashiiro commented 2 years ago

That is a nested replacement :(

karashiiro commented 2 years ago

Possibly fixed with v1.9.10.1.

johnysandels commented 2 years ago

Sorry! I didn't get a chance to test before maintenance!

johnysandels commented 2 years ago

Seems like my pronemes aren't being used at all anymore. I found this in the dalamud log, and it seems to only be speaking what should be pronemes.

2021-12-21 19:05:46.528 -08:00 [INF] [TextToTalk] <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">Amalj'aas</speak>
2021-12-21 19:05:53.713 -08:00 [INF] [TextToTalk] <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">Amalj'aa</speak>
2021-12-21 19:06:02.767 -08:00 [INF] [TextToTalk] <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">Amalj'aa</speak>
2021-12-21 19:06:10.610 -08:00 [INF] [TextToTalk] <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">Urianger</speak>
2021-12-21 19:06:44.713 -08:00 [INF] [TextToTalk] <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-US">Alphinaud</speak>

EDIT: The only pronemes I could find that are working so far are pronemes that relate somehow to the priority system, Vanu and Vanus work, and play out properly. and I've also noticed that the 's on Y'shtola is being registered, but it has priority over Y'shtola (which could just be because it doesn't even see Y'shtola as a lexicon or something)

2021-12-21 19:10:04.104 -08:00 [INF] [TextToTalk] <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-GB"><phoneme ph="vɑːnu">Vanu</phoneme> <phoneme ph="vɑːnuz">Vanus</phoneme></speak>
2021-12-21 19:10:43.342 -08:00 [INF] [TextToTalk] <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-GB"><phoneme ph="vɑːnuz">Vanus</phoneme></speak>
2021-12-21 19:10:49.995 -08:00 [INF] [TextToTalk] <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-GB"><phoneme ph="vɑːnu">Vanu</phoneme></speak>

the 's on Y'shola

2021-12-21 19:12:04.467 -08:00 [INF] [TextToTalk] <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-GB">Y<phoneme ph="z">s</phoneme>htola</speak>
2021-12-21 19:12:08.066 -08:00 [INF] [TextToTalk] <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-GB">Y<phoneme ph="z">s</phoneme>htola<phoneme ph="z">s</phoneme></speak>

inputted /echo Y'stola, then /echo Y'shola's

also Noticed Eorzea and Eorzean aren't being seeing as pronemes

2021-12-21 19:31:28.376 -08:00 [INF] [TextToTalk] <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-GB">Eorzea</speak>
2021-12-21 19:31:35.012 -08:00 [INF] [TextToTalk] <speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="en-GB">Eorzean</speak>

johnysandels commented 2 years ago

FFXIVCharacters.Locations.zip This is the lexicon I've been testing btw. It's the updated version of the Location and characters lexicon, but I'm still waiting on posting it until after 😊

karashiiro commented 2 years ago

Are you using v1.9.11 or v1.9.11.1? I keep adding tests and they keep passing 🥴

karashiiro commented 2 years ago

Nevermind, I managed to get a test to fail when loading that whole lexicon into the test, I'm looking into it.

karashiiro commented 2 years ago

Looks like the C# sorted dictionary doesn't like it when multiple keys have the same key difference between each other, so it's just replacing graphemes that have the same length as each other. That throws a wrench into things, fixing.

karashiiro commented 2 years ago

Alright, I rewrote most of the lexicon manager, let me know if it works 👍

johnysandels commented 2 years ago

Seems like it is working great! Everything I could think of works flawlessly! Amazing work!

karashiiro / TextToTalk

(PLS) Longer graphemes do not take precedence #48