Automatic language switching does not work for SAPI 5 voices

gexgd0419 commented 1 month ago

Steps to reproduce:

Make sure Automatic language switching is enabled, and you have multiple SAPI 5 voices in different languages.

Select a SAPI 5 voice. Then, go to a webpage with proper language attributes, where the language is different from your current voice language.

Actual behavior:

The voice is not switched.

Expected behavior:

The voice should be switched to another SAPI 5 voice in the corresponding language, just like when using OneCore voices.

NVDA logs, crash dumps and other attachments:

Several log lines similar to the following:

DEBUGWARNING - synthDrivers.sapi5.SynthDriver.speak - MainThread:
Unsupported speech command: LangChangeCommand

System configuration

NVDA installed/portable/running from source:

Installed

NVDA version:

2024.1

Windows version:

Windows 11 23H2

Name and version of other software in use when reproducing the issue:

Other information about your system:

Other questions

Does the issue still occur after restarting your computer?

Have you tried any other versions of NVDA? If so, please report their behaviors.

No

If NVDA add-ons are disabled, is your problem still occurring?

Yes

Does the issue still occur after you run the COM Registration Fixing Tool in NVDA's tools menu?

Automatic language switching is implemented for OneCore voices, but not for SAPI 5 voices.

SAPI 5 also supports some XML tags to switch between languages/voices, such as <voice> and <lang>.

seanbudd commented 1 month ago

Welcome @gexgd0419 ,

We are unable to process this issue as it stands.

To be able to reproduce this issue, we need a web page example. Please link an example or provide a minimal HTML sample to reproduce this using codepen.
Please reproduce this issue and provide a log file of the behaviour. Ensure your log level is set to debug in general preferences.
Do you have an extra SAPI5 voices installed? What language/voice are you expecting it to switch to.

Kind Regards, NV Access Software Developers

gexgd0419 commented 1 month ago

I reproduced the issue again with NVDA 2024.3.1, in a Windows 10 22H2 virtual machine, with no third-party software or voice installed.

Steps:

Install the Chinese (Simplified) voices and the English (US) voices in the Windows system settings. You will have the following voices:
- OneCore voices in Chinese: Microsoft Huihui, Microsoft Yaoyao, Microsoft Kangkang
- OneCore voices in English: Microsoft David, Microsoft Zira, Microsoft Mark
- SAPI5 voices in Chinese: Microsoft Huihui Desktop
- SAPI5 voices in English: Microsoft David Desktop, Microsoft Zira Desktop
- Use the following HTML for testing:
```
<html><body>
<p lang="zh-CN">This should be spoken with a Chinese voice.</p>
<p lang="en">This should be spoken with an English voice.</p>
</body></html>
```
Select Microsoft Huihui as the default voice. When moving to the second line, the voice should be switched to an English one.

Interestingly, both Narrator and NVDA support switching to an English voice when using the OneCore version of Microsoft Huihui, and they also both fail to switch the voice when using the SAPI5 version.

Here's the log file: nvda.log

Notably, the SAPI5 driver logged a bunch of Unsupported speech command: LangChangeCommand lines. Here's probably where the log was written: https://github.com/nvaccess/nvda/blob/1b68639cf54086603dd372756ea45ba82277539e/source/synthDrivers/sapi5.py#L341-L342

The for loop processes str, IndexCommand, CharacterModeCommand, BreakCommand, PitchCommand, VolumeCommand, RateCommand, and PhonemeCommand, but not LangChangeCommand.

seanbudd commented 1 month ago

Thanks for clarifying. I think this is a valid bug, but a technical investigation is needed to see if SAPI5 can support these commands

gexgd0419 commented 1 month ago

In SAPI5 TTS XML, <voice> and <lang> tags can be used to switch between voices.

<lang> tags can be used to switch languages. It will be applied to the text inside the tag, or after the tag if the tag is self-closing.

<lang langid="409">This should be spoken with an English US voice.</lang>
This will be spoken in the original voice.
<lang langid="804">This should be spoken with a Simplified Chinese voice.</lang>
<lang langid="809"/>The rest of the text should be spoken with an English UK voice.

I would recommend using the non-self-closing format, because changing the language back won't necessarily bring you back to the original voice, but closing the tag will.

langid is the language ID in hexadecimal with no 0x prefix. It will be matched against the "Language" attribute of all installed SAPI5 voices.

Although you cannot decide the exact voice that will be selected, some attributes of the current voice, such as gender, will be considered during voice selection. If the current voice is a Japanese female voice, and the language ID is 409, it will try to find an English female voice first, then an English male voice. If no voice matches the language ID, the voice won't be changed, so the current voice is used.

Also, you can use the <voice> tag, which gives you more control over voice selection.

<voice required="Language=409">This must be spoken with an English US voice. If there's no such voice, an error is thrown.</voice>
<voice optional="Language=809">This should be spoken with an English UK voice. But if there's no such voice, just use the current voice.</voice>
<voice required="Gender=Male" optional="Language=409">This must be spoken with a male voice. Also should use an English US voice, but if there is none, any language is OK.</voice>
<voice optional="Name=Microsoft Zira Desktop;Gender=Female">This should be spoken by Microsoft Zira Desktop. If there's no such voice, use another female one if possible.</voice>
<voice optional="Language=809;Language=409">This should be spoken with an English UK voice. If there's no such voice, try finding an English US one if possible.</voice>

These two tags are handled by the SAPI5 framework, so it will do the voice selection for you.

gexgd0419 commented 4 weeks ago

Here I found a similar issue, but for Microsoft Speech Platform voices: #2561, where someone said:

It's definitely possible to implement this. However, having tested it, there is a huge pause between chunks of text when switching languages, almost a second. If this isn't acceptable, there isn't any point implementing it, as we can't shorten this.

According to my test, the SAPI framework will create and keep the main voice token, so it will be reused for all subsequent Speak calls. But the voices that will be dynamically switched to using <voice> and <lang> tags will be created every time Speak is called, and will not be reused. For voices that take a long time to initialize, this may introduce some delay.

Here's my PR #17156 that proves using <lang> tags to switch languages does work for SAPI5. Despite the documentation says that "the Lang tag is a shortened version of the Voice tag with the Required attribute containing 'Language=xxx'", this is not correct, as using a non-existent language ID in the Lang tag just falls back to the current voice, instead of throwing an error. So the Lang tags can be safely used without checking the language.

So far I think that using <voice> or <lang> tags is the easiest way to implement this feature. For built-in Microsoft voices, the delay is only a little worse than their OneCore counterparts. But if this introduces a significant delay when using most third-party SAPI5 voices, maybe we should find another way, or turn the Automatic language switching for SAPI5 voices off by default.

XLTechie commented 4 weeks ago

If users who want this feature find it too slow, they can turn it off, yes? Isn't it better to have the option available, even imperfectly, than to not have it at all?

gexgd0419 commented 4 weeks ago

I agree. The issue #2561 was created 12 years ago and is still not fixed. As mssp.py shares the implementation of sapi5.py, this PR should fix both of them. Of course, more test should be done using different SAPI5 and MSSP voices.

nvaccess / nvda