UIA: reduce number of extraneous text change events in conhost

microsoft / terminal

The new Windows Terminal and the original Windows console host, all in the same place!

MIT License

94.42k stars 8.17k forks source link

UIA: reduce number of extraneous text change events in conhost #10822

Open codeofdusk opened 2 years ago

codeofdusk commented 2 years ago

NVDA has trouble processing large numbers of events, leading to hangs of the entire UIA client implementation (nvaccess/nvda#11002). Consider the possibility of reducing the number of events we send to capture the fact that text changed without creating unnecessary work for clients.

carlos-zamora commented 2 years ago

This seems useful: https://docs.microsoft.com/en-us/windows/win32/api/uiautomationclient/nf-uiautomationclient-iuiautomation6-get_coalesceevents

I should check if XAML is doing this for us (and if not, ask them if they can).

carlos-zamora commented 1 year ago

I'm pretty sure this is also applies to Windows Terminal. So I'm adding that tag too.

carlos-zamora commented 1 year ago

With regards to a solution, @lhecker and I had a chance to chat about it today. Here's some notes from that meeting:

Idea: batch the notifications
- We can't do that because we don't know how much text we're receiving. That's all on the shell we're connected to.
Idea: diff text on our end and send better notifications that way
- Same problem as above.
Idea: provide a way for NVDA (or an attached screen reader) to tell us to stop/continue sending events
- Changes on our side:
  - use WaitForSingleObjectEx much like we do in the AtlasEngine to throttle our notifications until the client is ready for more
- Changes on NVDA's side:
  - use CreateEventW, SetEventW, and ResetEventW to notify us when we should stop/continue sending events
- General flow:
  1. NVDA: check WT custom UIA property to see if we support this method
  2. WT: return an event via the property
  3. NVDA: when you're approaching a limit (determined by you), call Reset.
  4. WT: we stop sending events. Cache any new text that has been written.
  5. NVDA: after making enough progress, call Set.
  6. WT: we start sending text from the cached data
- References:
  - https://docs.microsoft.com/en-us/windows/win32/coreaudio/exclusive-mode-streams
  - https://docs.microsoft.com/en-us/windows/uwp/gaming/reduce-latency-with-dxgi-1-3-swap-chains#step-4-wait-before-rendering-each-frame
- Corner cases:
  - What happens if NVDA crashes and we stopped sending events?
    - We need some way to be aware if NVDA is still connected. UIA may be able to provide that information. Alternatively, we could get the client process handle and wait on that handle directly. Once the process exits, the process handle gets closed and we would immediately be aware of that.

codeofdusk commented 1 year ago

I'm pretty sure this also applies to Windows Terminal.

It does, but less severely.

Idea: provide a way for NVDA (or an attached screen reader) to tell us to stop/continue sending events

This is related to notifications (not text changes), so a slightly different but related issue. As mitigation for this issue, I've opened nvaccess/nvda#14067 as a start, with an eventual end goal to turn these off for terminal completely once nvaccess/nvda#14047 is merged.

As for the stop/continue sending notifications approach here, this should theoretically work on the NVDA side (I'm assuming with some sort of speech callbacks, so maybe synth dependent), but I'll leave it to you and @leonardder to work out the details.

lhecker commented 1 year ago

To extend on what @carlos-zamora said and add some IMHO convincing arguments: I understand that UIA doesn't just exist for text-to-speech, but I'm nonetheless surprised to find that there's seemingly no official feedback mechanism to prevent buffering too much such "speech".

The text-to-speech system as implemented by NVDA and as used by us, sounds to me basically like any other audio API, but with a different concept of how to encode sound. The notification API to me appears like a classic queue, for which we need to control the delay to prevent us from buffering too much data.

Audio, but also video APIs, are queue based as well and both have ways of getting feedback whether the queue is full, as this prevents latency buildup, reduces memory usage and improves stability. Network equipment has a similarly related issue which is often called "bufferbloat":

[...] bufferbloat occurs when a network link becomes congested, causing packets to become queued for long periods in these oversized buffers.

Backend engineering has to deal with similar issues of so called "back pressure" in distributed systems, just like our two processes (Windows Terminal and NVDA) present a form of a distributed system, and it was always of paramount importance to handle this gracefully to prevent systematic failure by running into timeouts and crashes due to going out of memory.

In short, I get the feeling that if we get NVDA to tell us when it's running low on text and wants more (low watermark) and when it has buffered enough data and needs us to stop (high watermark), we can prevent buffering too much data and reduce the latency between text appearing in the terminal and it being read out. If this is correct, I believe we have a lot of prior art from which we can learn here.

LeonarddeR commented 1 year ago

I finally got around to think about this:

Idea: batch the notifications

We can't do that because we don't know how much text we're receiving. That's all on the shell we're connected to.

I read something about a 1000-character limit for notifications. How about that, i.e. when text is coming in exceeding that length, how is that handled currently?

Idea: provide a way for NVDA (or an attached screen reader) to tell us to stop/continue sending events

Is this supposed to fix latency issues? In the case of NVDA, I have several concerns about its UIA event processing, mainly because it happens entirely in python and therefore its performance is limited by the Python Global Interpretter lock. Therefore I don't think it is terminal's primary responsibility to think about event limitation. Of course, it makes no sense to bloat a screen reader with events, but notifications for incoming text are perfectly fine to me.

carlos-zamora commented 1 year ago

I read something about a 1000-character limit for notifications. How about that, i.e. when text is coming in exceeding that length, how is that handled currently?

We currently have a pretty naive approach: https://github.com/microsoft/terminal/blob/62b34cf6f77199df306999f89a5f57368cfa5059/src/renderer/uia/UiaRenderer.cpp#L283-L291

Basically, we send a notification for each 1000 characters, all filled as much as possible. We can change that, if requested (probably best to track in a separate issue though).

Is this supposed to fix latency issues?

This would provide a way for NVDA to adjust to backpressure. @lhecker and I are surprised that UIA doesn't have a first-party solution for this problem, so the idea would be to do it ourselves. That said, @DHowett brought up some major concerns offline. Particularly in that this could be used maliciously and create a textbook denial of service attack.

lhecker commented 1 year ago

Particularly in that this could be used maliciously and create a textbook denial of service attack.

BTW as I mentioned before, and if we ever need to implement anything like that, I strongly believe we'll find lots of helpful prior art in network algorithms (e.g. TCP), as it deals with the same class of issues (backpressure, etc.) with mitigations that have the same class of downsides (DoS, etc.), as far as I can see. For instance in this case we can look at slowloris attack mitigations.

LeonarddeR commented 1 year ago

Basically, we send a notification for each 1000 characters, all filled as much as possible. We can change that, if requested (probably best to track in a separate issue though).

I guess we can at least implement batching for that, since we know when the last chunk comes in. The question is how to communicate it. We can play with the activitiy id may be?

@lhecker and I are surprised that UIA doesn't have a first-party solution for this problem, so the idea would be to do it ourselves.

There is UIA event coalescing, but I don't think it works for notifications. There is also connection recovery, that should kick in if the provider doesn't respond quick enough. I guess the latter means that a screen reader can freeze its provider when it's too busy, in which case messages should be discarded automatically. But I'm not sure whether that's really the case.