nvaccess / nvda

NVDA, the free and open source Screen Reader for Microsoft Windows
https://www.nvaccess.org/
Other
2.12k stars 638 forks source link

Add extension points for speech to avoid NVDA Remote monkeypatching #14520

Open LeonarddeR opened 1 year ago

LeonarddeR commented 1 year ago

Is your feature request related to a problem? Please describe.

NVDA Remote currently relies on a lot of monkeypatching to intercept braille, speech, gestures, tones and waves from the controlled system. #14503 intends to fix most of this, except for speech.

Speech is a difficult thing. Historically. NVDA Remote patched methods on the synth directly, e.g. speak, cancel. With speech refactor becoming available, this was changed to patch methods on the speech manager. This had several advantages, including:

  1. Indexing and callback commands could be handled by the controller. This means that beeping in the middle of a sentence would work correctly, as the beep callback is executed by the controller
  2. It was no longer necessary to patch the synthesizer

However, there are some drawbacks and current shortcomings:

  1. NVDA Remote must mess around with the speech state. speech.speech._speechState.beenCanceled is set to False before NVDA remote can speak
  2. Cancellable speech does not work

Describe the solution you'd like

I think that for NVDA Remote, the current approach should be preferred over the older approach where the controller relies on controlled system indexing. In other words, shortcoming 2 does not outweigh advantage 1. To solve shortcoming 1, it might be necessary to use speech.speak, speech.cancelSPeech and speech.pauseSPeech directly. On the other hand, for #3564, it makes more sense to work with a virtual synth driver, since the controlled system (or server) doesn't require local speech output. In that case, it's fine if the controlled system does the indexing and cancellable speech should work as well.

That said, we should solve shortcoming 1, e.g. NVDA Remote should be able to queue speech without fiddling with the internal speech state. Now I'm not very comfortable with this new approach yet. Therefore I hope @feerrenrut can share his ideas about this.

To avoid monkey patching, we need several extension points to cover both approaches:

Additional requirements:

I'd love thoughts from @feerrenrut, @ctoth and @tspivey as well as other interested people.

CyrilleB79 commented 1 year ago

Mentioning here various add-on repositories and their author who are patching speech.speak or speech.speech.speak and who may be interested by replacing the monkey patching by an extension point:

ABuffEr commented 1 year ago

Hi, I'm following @leonardder 's work with high interest. I'm not sure if my observation is valid/important (I must see extension points in action), but should not this PR include speechDictHandler.processText? I monkey-patched it in widely appreciated NumberProcessing, but trying to introduce continous mode in Unicode-ASCII by @sukiletxe, I realized that - for my knowledge - two add-ons of this kind cannot operate easily and safely at the same time. While it'd be possible with an extension point over speechDictHandler.processText, I suppose.

LeonarddeR commented 1 year ago

Why would instant translate need to patch speech.speak?

LeonarddeR commented 1 year ago

@ABuffEr I guess speechDictHandler.processText could have a filter to accomplish this. Might be best to do this in a separate pr. There is now a filter in the braille module, so it should be pretty trivial to look at that and implement this for even a new dev.

CyrilleB79 commented 1 year ago

Why would instant translate need to patch speech.speak?

Because there is a script to translate the last spoken text. The patch is used to store in a variable the last speech sequence passed to this function.

jscholes commented 1 year ago

there is a script to translate the last spoken text

While the proposed extension point(s) would help with this use case, it also seems like a built-in NVDA speech history buffer would be more appropriate for such simple functionality. I.e. allowing an add-on to pull the last spoken sequences on demand, rather than being pushed everything because one sequence might be needed at some point.

seanbudd commented 1 year ago

Now that the dust has settled, @leonardder do you plan on proceeding with this?

LeonarddeR commented 1 year ago

I'm still not sure how to proceed regarding where to inject the extension point as well as how to deal with cancellable speech.

beqabeqa473 commented 12 months ago

Yes, it would be great to see speech coming through extension point. That would eliminate lots of severe issues that would come from some addons incorrectly working with patching.

seanbudd commented 9 months ago

I think there is still some work left here right? @LeonarddeR could you update the issue with the remaining extension points requested?

LeonarddeR commented 9 months ago

I think the following points are still open:

Additional requirements:

beqabeqa473 commented 8 months ago

Hi @LeonarddeR

If #16213 will be merged, will it be still needed to move processing of symbols and dictionaries to manager.speak? If we subscribe to an action, maybe it will be better to pass unfilterred speech to speech.speak on the controller side?

LeonarddeR commented 8 months ago

It all depends on the expectations of the add-on I think. My request to move filtering of lang change commands was mainly because its current location in speech.speak is a bit odd. I believe NVDA Remote currently patches the manager, that means that the controller gets language switching, dictionaries and other pronunciation behavior from the controlled machine. Given the nature of NVDA Remote, I think it makes sense for it to always speak the remote system with the rules applied on the local system.

beqabeqa473 commented 8 months ago

Yes, that is what i am saying that instead of patching manager.speak, we could subscribe to pre_filter_action and use speech.speak on the local system. Or maybe i am missunderstanding something?

LeonarddeR commented 8 months ago

I'm not sure whether that works instantly, but it makes sense for sure.