NVDA 2023.3 is not reading certain lists appropriately across Microsoft office.

manish10 commented 11 months ago

Steps to reproduce:

I am giving below a simple set of steps but the problem is not limited to this one case. I am facing it across all lists in multiple office products, including word and powerpoint.

Open word and create several lines of text each separated by a new line.
press alt+q to get to the search box and type "bullet"

Actual behavior:

One of the options that shows up in the search results list is read by NVDA as "best action 1 of 6".
NVDA fails to read the actual content of the list item "bullets and numbering".

Expected behavior:

The search result should be read as "best action, bullets and numbering, 1 of 6" . NVDA should read the actual content of the list item. This worked correctly in NVDA 2023.2 and earlier versions.
NVDA logs, crash dumps and other attachments:

There is no error or crash in the NVDA logs.

System configuration

NVDA installed/portable/running from source:

NVDA 2023.3 installed on windows 11.

NVDA version:

2023.3

Windows version:

windows 11 pro build 22h2 22621.2428

Name and version of other software in use when reproducing the issue:

Microsoft word win 32 latest m365 desktop version.

Other information about your system:

Other questions

Does the issue still occur after restarting your computer?

yes.

Have you tried any other versions of NVDA? If so, please report their behaviors.

this works correctly in 2023.2. I did notice this behavior in one of the 2023.3 beta builds also.

If NVDA add-ons are disabled, is your problem still occurring?

I don't have any add-ons.

Does the issue still occur after you run the COM Registration Fixing Tool in NVDA's tools menu?

I have not tried this.

michaelDCurran commented 11 months ago

I am struggling to reproduce this. I hear "Best action grouping 1 of 6 bullet split button create a bulletted list ".

michaelDCurran commented 11 months ago

though I am running an Office insiders build (beta channel) so it is possible it may have been somehow fixed by MS word?

manish10 commented 11 months ago

Additional information: This only happens with the eloquence synthesizer addon I have. The problem goes away if I use espeak or one core voices. I wrote the eloquence addon for the India licensed copy of eloquence. It is probably broken because of this release changes to wasapi etc. Can you give me some pointer on where I can start to debug it?

manish10 commented 11 months ago

also apologies for incorrectly answering the addons question in the original bug template...

seanbudd commented 11 months ago

Closing as this is an add-on bug

Note this snippet might be helpful, are you sure this is related to WASAPI? You can check this by disabling WASAPI in advanced settings

https://github.com/nvaccess/nvda/pull/14900#issuecomment-1659975589

XLTechie commented 11 months ago

Can you set your NVDA log level to debug, and provide a log while you demonstrate this behavior?

Also, if you turn WASAPI off in Advanced settings of NVDA, does the problem go away?

manish10 commented 11 months ago

Although the issue is closed, I will provide additional info here in the hope of getting inputs on how to resolve this:

Changing the setting to turn wasapi off has no effect.
I added debug statements to the eloquence plugin point of entry and ran the old and new NVDA versions with log level set to debug and the same eloquence plugin. In the below old and new debug statement extracts from the logs, it is clear that the speak method in speech.py tries to say the same thing in both cases, in the new code, the eloquence plugin never gets called for the second call, which is what the problem is. There is a lot of code that gets executed in speech.py and manager.py after the log statement from that file before the eloquence plugin will get called and the problem is probably because of a change in that code. Will appreciate if anyone can take a look and provide their thoughts on what could be causing this. The extracts from the new and old log files are pasted below.

IO - inputCore.InputManager.executeGesture (16:15:55.816) - winInputHook (14972): Input: kb(laptop):downArrow DEBUG - editableText.EditableText._hasCaretMoved (16:15:55.877) - MainThread (8000): Focus event. Elapsed 0.0300276 sec IO - speech.speech.speak (16:15:55.988) - MainThread (8000): Speaking [LangChangeCommand ('en'), 'Best Action', 'grouping', '1 of 7', CancellableSpeech (still valid)] DEBUG - external:synthDrivers.eloquence.SynthDriver.speak (16:15:55.988) - MainThread (8000): speak: [LangChangeCommand ('en'), 'Best Action ', 'grouping ', '1 of 7 ', IndexCommand(28)] DEBUG - external:synthDrivers.eloquence.SynthDriver.speak (16:15:55.988) - MainThread (8000): speak: string: 'Best Action ' DEBUG - external:synthDrivers.eloquence.SynthDriver.speak (16:15:55.988) - MainThread (8000): speak: string: 'grouping ' DEBUG - external:synthDrivers.eloquence.SynthDriver.speak (16:15:55.988) - MainThread (8000): speak: string: '1 of 7 ' DEBUG - external:synthDrivers.eloquence.SynthDriver.speak (16:15:55.988) - MainThread (8000): speak: index: 28 DEBUG - external:synthDrivers.eloquence.SynthDriver.speak (16:15:55.988) - MainThread (8000): speak: last is not none and not last.rstrip()[-1] in punctuation: '`pp0 `vv92 1 of 7 ' IO - speech.speech.speak (16:15:55.996) - MainThread (8000): Speaking [LangChangeCommand ('en'), 'Bullets', 'split button', 'collapsed', 'Create a bulleted list.\n\nClick the arrow to change the look of the bullet.', '1 of 1', CancellableSpeech (still valid)] There are no more calls to the eloquence plugin after the above log statement and the next speech call is for a completely different item. In contrast, below is the same extract from the old NVDA where the eloquence plugin gets called for the second part also: IO - inputCore.InputManager.executeGesture (15:58:07.811) - winInputHook (11792): Input: kb(laptop):downArrow DEBUG - editableText.EditableText._hasCaretMoved (15:58:07.874) - MainThread (16568): Focus event. Elapsed 0.0278621 sec IO - speech.speech.speak (15:58:07.910) - MainThread (16568): Speaking [LangChangeCommand ('en'), 'Best Action', 'grouping', '1 of 7', CancellableSpeech (still valid)] DEBUG - external:synthDrivers.eloquence.SynthDriver.speak (15:58:07.910) - MainThread (16568): speak: [LangChangeCommand ('en'), 'Best Action ', 'grouping ', '1 of 7 ', IndexCommand(34)] DEBUG - external:synthDrivers.eloquence.SynthDriver.speak (15:58:07.910) - MainThread (16568): speak: string: 'Best Action ' DEBUG - external:synthDrivers.eloquence.SynthDriver.speak (15:58:07.910) - MainThread (16568): speak: string: 'grouping ' DEBUG - external:synthDrivers.eloquence.SynthDriver.speak (15:58:07.910) - MainThread (16568): speak: string: '1 of 7 ' DEBUG - external:synthDrivers.eloquence.SynthDriver.speak (15:58:07.910) - MainThread (16568): speak: index: 34 DEBUG - external:synthDrivers.eloquence.SynthDriver.speak (15:58:07.910) - MainThread (16568): speak: last is not none and not last.rstrip()[-1] in punctuation: '`pp0 `vv92 1 of 7 ' IO - speech.speech.speak (15:58:07.922) - MainThread (16568): Speaking [LangChangeCommand ('en'), 'Bullets', 'split button', 'collapsed', 'Create a bulleted list.\n\nClick the arrow to change the look of the bullet.', '1 of 1', CancellableSpeech (still valid)] DEBUG - external:synthDrivers.eloquence.SynthDriver.speak (15:58:08.707) - MainThread (16568): speak: [LangChangeCommand ('en'), 'Bullets ', 'split button ', 'collapsed ', 'Create a bulleted list. Click the arrow to change the look of the bullet. ', '1 of 1 ', IndexCommand(35)] DEBUG - external:synthDrivers.eloquence.SynthDriver.speak (15:58:08.707) - MainThread (16568): speak: string: 'Bullets ' DEBUG - external:synthDrivers.eloquence.SynthDriver.speak (15:58:08.711) - MainThread (16568): speak: string: 'split button ' DEBUG - external:synthDrivers.eloquence.SynthDriver.speak (15:58:08.711) - MainThread (16568): speak: string: 'collapsed ' DEBUG - external:synthDrivers.eloquence.SynthDriver.speak (15:58:08.711) - MainThread (16568): speak: string: 'Create a bulleted list. Click the arrow to change the look of the bullet. ' DEBUG - external:synthDrivers.eloquence.SynthDriver.speak (15:58:08.711) - MainThread (16568): speak: string: '1 of 1 ' DEBUG - external:synthDrivers.eloquence.SynthDriver.speak (15:58:08.711) - MainThread (16568): speak: index: 35 DEBUG - external:synthDrivers.eloquence.SynthDriver.speak (15:58:08.711) - MainThread (16568): speak: last is not none and not last.rstrip()[-1] in punctuation: '`pp0 `vv92 1 of 1 '

manish10 commented 10 months ago

With some more testing, I found that the missing content does get spoken if I keep waiting for about 12 seconds in all cases. The specific issue is that multiple speak commands get queued up and a callback flushes the queue. for some reason, the flush does not happen any more until 12 seconds later. Nothing has changed in my synth handler code and the exact same code works with the flush happening correctly with the older version of NVDA but not with 2023.3. The issue has already been marked closed but something has changed in the NVDA code to cause this behavior. I could not find anything related to this in the change log for this release.

Will really appreciate any help.

lukaszgo1 commented 10 months ago

Just to make sure have you restarted NVDA after disabling WASAPI in advanced settings? If this is not caused by WASAPI would you be able to bisect the commit which introduced the regression?

manish10 commented 10 months ago

After I restarted NVDA after disabling wasapi in advanced settings, the issue went away. Thanks for this suggestion. All the times I tried this earlier, I must not have restarted after selecting that setting. How can we fix this? Should I reopen this issue to help track this?

lukaszgo1 commented 10 months ago

Would you be able to test your add-on with the latest Alpha, there were various improvements to WASAPI since 2023.3 was released, so it may got fixed by accident. Note that you would need to override compatibility of your add-on to make this test. Is the code of your add-on available somewhere?

manish10 commented 10 months ago

I can make the code available to you. It is not a public repo because the eloquence synth is licensed. I will try and install the alpha build: does this mean just the latest nightly build or is there a specific build you would like me to use?

manish10 commented 10 months ago

Actually, I am running NVDA from source and have the latest code from the repo. So, this is not fixed in any alpha build.

lukaszgo1 commented 10 months ago

Since this is clearly caused by WASAPI cc @jcsteh (let's hope this is the last time someone triaging issues has to CC you due to WASAPI related problems, though this is very likely wishful thinking).

jcsteh commented 10 months ago

The specific issue is that multiple speak commands get queued up and a callback flushes the queue.

Can you elaborate on this? With WASAPI enabled, the callbacks should always be fired, but you should not assume that the callback for the previous call to feed will necessarily be fired during the very next call to feed. This was always an incorrect assumption - it was never a promise made by the API - but you could get away with it with the old WinMM code due to an implementation detail.

There is a good reason for this change. Previously, if you called feed with two very tiny chunks of audio and callbacks, we had to wait for the first chunk to finish before we could push the second. This caused buffer underruns and thus tiny annoying glitches in the audio output; see #10185 and #11061. With the WASAPI code, we keep pushing audio until the buffer is sufficiently full to avoid underruns/dropouts. Callbacks are fired based on the audio clock rather than delaying the pushing of subsequent chunks.

You can call WavePlayer.sync to wait for all audio to complete, which will fire any pending callbacks. However, that will also mean that if the last chunk of audio was not a complete phrase, you'll get a glitch in your audio output, though you would have with the old code anyway.

manish10 commented 9 months ago

Thanks @jcsteh for the above. You are correct in that the callback does get fired but on some clock and not immediately after the previous chunk was spoken. You mention this is by design. In this case, what is the correct fix for this that I should implement in the tts driver?

While I implement the correct fix, for the interim, where should I call WavePlayer.sync?

(was travelling over the holidays and couldn't respond sooner)

jcsteh commented 9 months ago

You are correct in that the callback does get fired but on some clock and not immediately after the previous chunk was spoken.

That's not quite correct. It does get fired immediately after the associated chunk is spoken. The important point is that it might not necessarily get fired during the very next call to feed() because two calls to feed() might be less than one audio buffer. That is, there might be several feed() calls before callbacks get fired. If there is a callback on the second last feed() call and the final feed() call isn't large enough to satisfy the buffer, the callback won't get called in the final feed() call.

To fix this, ensure you call idle() at the end of each utterance. This will ensure that the buffer is flushed, among other things.

manish10 commented 9 months ago

If I call idle() after the feed() call, the original problem is fixed. However, it causes the speech to become very choppy as if the tail end of each syllable is being overwritten by the next speech. Could you guide me a little more here about what the correct sequence should be for calling the feed() method? do all synth drivers call idle right after calling feed() every time? or did I misunderstand your ask above to call idle after every utterance? is that different from calling idle right after feed()?

jcsteh commented 9 months ago

You should only call idle() at the end of an utterance; i.e. the last chunk generated due to a speak() call.

manish10 commented 9 months ago

This fixed the issue but I still get a weird echo at the end of the speech. It is not very noticeable for words or long text. However, it is very noticeable for single letter navigation and for key-echo during typing. Is there a way for me to avoid this echo?

jcsteh commented 9 months ago

Can you reproduce this with any other synthesiser? With WASAPI, an echo would suggest that the same chunk is being sent more than once, but I don't see how that could be happening unless you're calling feed() twice with the same chunk or unless you're calling idle() before the final chunk of the utterance is sent. Without WASAPI, an echo might be due to a buffer underrun, which happened quite a lot with the non-WASAPI code.

nvaccess / nvda