Feature request: build JavaScript accessibility bridge to improve accessibility of web applications

This is a proposal for future direction NVDA project can take, rather than a concrete feature request. I would like to see if there is an alignment with NVDA devs on whether this is a reasonable direction. I apologize for creating another huge issue - it'll probably take ~10 minutes to read it; however since I am proposing a brand new direction, I had to lay out extraordinary justifications in order to convince people that this is right direction. There are many headings below for ease of navigation.

TL;DR

The world is rapidly moving towards web technologies and away from natively executed computer programs - think of Google Docs or VSCode - just to name a few examples. Current state of accessibility of many web applications is unsatisfactory (examples below) and the reason is that communication between screenreader and web applications happens through accessibility APIs (mostly IAccessible2) that either don't account for needs of modern web apps (Google Docs) or are poorly implemented (e.g. large text in edit boxes in Google Chrome is too slow). I propose to start working on a fundamental solution to this problem: build a JavaScript accessibility bridge to allow screenreaders to communicate directly with JavaScript environment of web apps running inside browser instead of relying on IAccessible2 with all its flaws. Below I list some concrete problems of status quo and propose a fundamental solution.

Problems with accessibility of web applications

Google Docs

The main problem with Google Docs is that they effectively implemented their own screenreader running inside the browser. While this is understandable (what else could Google devs have done at the time to make it accessible to screenreader users), and I feel gratitude to Google devs for making Google Docs accessible; this is not a good solution in the long run. Here are some reasons why implementing your own screenreader in every web-based office solution is a problem for screenreader users:

It's wrong for every online office suite to build their own screenreader. We will end up in the world where every vendor building their own screenreader with their own set of shortcuts, that will likely be incompatible with each other. This will be an extra obstacle for visually impaired people to get a job or switch to another online office suite. While it is understandable that some shurtcuts for advanced functions will be different across different apps - in the end even for sighted people there is a learning curve when switching to a new vendor - but it is wrong to have difrferent set of shortcuts for basic tasks, like jump to next heading or jump to next word. Instead we would like to mimic the situation with native office suites: it is relatively easy for NVDA users to switch between Microsoft Word and LibreOffice Writer, since most of the navigational keystrokes are NVDA keystrokes and don't need to be relearned.
Weird behavior of word and character navigation. When pressing RightArrow Google Docs speaks the previous symbol, instead of current one; Same applies to Control+RightArrow. This is inconvenient for NVDA users (and probably Jaws users as well). It is wrong that NVDA and Jaws users have to relearn basic character and word navigation commands in order to be able to work with Google Docs. This is mitigated by turning on Braille mode though.
Very complicated shortcuts. Jump to next heading is Control+Alt+N Control+Alt+H. Jump to next table is even worse: Control+Alt+Shift+N Control+Alt+Shift+T. On one hand it is understandable, since for web apps the selection of keystrokes is limited, but on the other hand the end result is still bad. Not to mention that in practice these shortcuts are often unreliable.
Building screenreaders is costly for vendors. While Google Docs appears to be the leader on the market, there might be other smaller companies offering their own online office suites, that are not accessible, just because there is no simple interface they can implement to enable accessibility (think of IAccessible2 or UIA for native apps), while they can't afford to build a brand new web-bbased screenreader themselves. Cursory Google search of competitors (QUIP and Zoho Writer) reveals that despite they claim to be screenreader-compatible, neither of them appears to have jump to next heading keystroke, which likely means that their screenreader support is only very basic if present at all. If we, screenreader developers, provide an easy-to-implement accessibility API in JavaScript, then it's more likely small companies will be able to afford to implement it in their products.
Screenreader in Google docs is closed source and is not extensible. For example they still don't have sentence navigation. Being closed source means that blind developers can't implement features they need. And besides, Google is well known for their unwillingness to talk to people outside, so trying to submit a feature request is not a very promising idea.
VSCode and Monaco editor

VSCode is an electron based application running from within chromium-like browser, and this means that all the communication between screenreader and its UI elements happens through IAccessible2 with all its limitations. Monaco is a browser based code editor with rich set of features written in TypeScript. While claiming to be accessible, it has one big accessibility flaw: it only provides 500 or so lines of your source code file through accessibility API at a time. More details can be found in microsoft/vscode#41423, which is blocked on this and that chromium issues. So if NVDA tries to retrieve some lines in current editable via TextInfo API, it would only see 500 lines, no more - VSCode effectively truncates the document. Why having access to only 500 lines of code is not enough?
SayAll doesn't work
Indentation navigation in source code (via IndentNav add-on) doesn't work.
Paragraph navigation with multi line break style (as introduced in #13798) doesn't work correctly in VSCode. Despite the author claiming to have tested it with VSCode, around line 500 it won't work correctly since it is using textInfo API, which would only see the first 500 lines of the file.
Sentence navigation (via SentenceNav add-on) doesn't work.
Quick search function (via Tony's enhancements add-on) doesn't work.
In general, one can come up with many other ideas of new functions that can be implemented via TextInfo API. None of them would work correctly beyond 500 lines of code in VSCode or other Monaco-based web applications.
CodeMirror editor

CodeMirror is another popular open-source online code editor and is a direct competitor to Monaco. It is being used most notably in Chrome and Firefox Developer Tools, Coderpad online editor (frequently used by major IT companies for online interviews) and GitHub's in-browser edit feature (full list can be found on CodeMirror real world uses page. The previous version CodeMirror 5 was not accessible by screenreaders as it only presented an empty text area - this was tracked in codemirror/codemirror5#4604 - notably CodeMirror maintainers are stating that adding accessibility support will require a major redesign and even proposing to raise funds for this. It appears however that recently some accessibility has indeed been implemented in CodeMirror 6, which is current production version. However when I try CodeMirror in the sandbox they only expose 35 lines to the screenreader, which is much less than Monaco. Therefore all the problems of Monaco that I laid out in the previous section apply to CodeMirror 6 as well, but all these problems are even much more severe here, due to much smaller frame size.

Proposed solution

I envision that the fundamental solution to the problems I mentioned above would be a new JavaScript accessibility API that the authors of web apps can easily implement. Let's call it JSAccessible for now. Then we'd need to figure out a way for NVDA to talk directly to that API bypassing IAccessible2 - more on that below. To illustrate this proposed accessibility API, for plain text editors we can think of something like this:
```
interface IJSAccessiblePlainTextEditable {
// functions to be implemented by Monaco or CodeMirror
getTextInRange(startIndex: number, endIndex: number): string;
getCursorIndex(): number;
setCursorIndex(index: number);
/// ....
};
```
If we can have Monaco and CodeMirror implement this API, then we can update NVDA to check whether current TextArea in browser happens to implement this API and if so, transparently switch from IAccessible2 to JSAccessible and retrieve text contents via IJSAccessiblePlainTextEditable::getTextInRange(startIndex, endIndex) call. This way we'd be able to easily have access to the entire buffer inside Monaco/CodeMirror instead of a frame containing only a a few hundred lines. Similarly, for online office solutions another more complicated interface can be developed that would provide NVDA information about text and its formatting, like font size and bold/italic attributes. The challenge here would be convincing Google to implement this interface for Google Docs, but my hope is if we produce a working prototype that works for plain text editors and this project gains some steam, eventually we'll be able to find a live human inside Google who can implement this JSAccessible interface for Google Docs. The next big question is how can we make NVDA to talk to that interface? A browser has an isolated JavaScript VM for every page and it's not easy to have a native application to talk to any code living within that VM. I can think of two approaches here:
1. Build a browser extension. Chrome and Firefox extensions have access to JavaScript environment of web pages. Then we can make use of Native Messaging API to have our extension to talk directly to NVDA binary via JSON protocol.
2. Selenium is an online automation/testing framework. It appears to be using WebDriver tool to access internals of browsers. Selenium allows to write scripts in many languages including Python. I am not an expert on JavaScript and web technologies, so I am not sure which way is the best. It would be great if anyone more knowledgeable can chime in on pros or cons of both approaches. But from my cursory investigation it seems at this point that either approach can do the job. So to summarize proposed design, we will need to punch a hole into JavaScript world inside browser environments by implementing either a browser extension that can talk to NVDA or via Selenium WebDriver; and then we can redesign implementation of NVDAObjects to try querying current text area in the browser via new JSAccessible interface; with a potential fallback to IAccessible2.
  Discussion
  
  Here I answer some questions ahead of time, that I anticipate to be asked to address potential skepticism.
  
  Why do I need to align anyone instead of just implementing a prototype myself?
  
  This is a huge project proposal that would require cooperation from many parties, like VSCode team, CodeMirror, Electron, potentially also Google Chrome and Google Docs teams. It would greatly help if NVDA devs are aligned, so that I'd be able to readch out to those parties on behalf of NVDA rather than an unknown independent dev - that would improve chances of PRs being accepted and in case of Google with closed source Google Docs it would increase the chances they'll be willing to collaborate. Also without alignment, my work can end up being an NVDA add-on instead of NVDA core feature and as it happens with add-ons, many people either hesitate or don't know how to use add-ons, so the impact of this project will be limited.
  
  We should follow W3C/WCAG standards. This proposal is unacceptable because it doesn't conform to the standards.
  
  We are facing the situation when current set of accessibility standards is not doing a satisfactory job - see lists of problems in the previous sections. In case of Monaco and CodeMirror it's actually the problem of Chromium implementation that prompts web application devs to search for workarounds. In case of Google Docs its developers chose to implement a new screenreader on their side because there was no effective way to communicate with a proper native screenreader from inside browser using existing accessibility APIs. I feel it is up to us, screenreader users and developers: either we are fine to put up with all the drawbacks of status quo - in fact that's one of the reasons why I am submitting this proposal - to see if perhaps most of the people here are not bothered by the problems I outlined above. Or if the problems are bothersome enough, it is up to us to work on a better solution. But in either case we shouldn't be constrained by the set of standards that were developed years ago and that most importantly do not solve accessibility issues that I outlined above.
  
  Why invent a new accessibility API instead of improving IAccessible2/UIA?
  
  Google Chrome, being the most popular browser on the market, has limitations in its implementation of IAccessible2, especially when it comes to large amounts of text in editables. Despite the fact that these issues were reported years ago (links to the issues can be found in Monaco section), not much progress has been done, so it appears that Chrome developers are either not much interested in fixing these issues, or proper fix is going to be technically challenging enough. So working on this direction doesn't appear to be hopeful to address accessibility of Monaco and CodeMirror. As for Google Docs, I only have a limited knowledge of IAccessible2 and cannot say for sure that it's not enough for Google Docs to expose document structure to screenreader the way Microsoft Word and LibreOffice do that. But I assume they had some good technical reasons to go the hard way and implement their own screenreader. However, if anyone is more familiar with IAccessible2 and accessibility layer of Google Docs, please feel free to chime in and tell us if an upgraded version of IAccessible2 can be used to improve Google Docs accessibility.
  
  Some people argue that having access to only a singole line of a document at a time is enough and screenreaders shouldn't provide more sophisticated functionality.
  
  For some people using only simple functions is enough. Others use more advanced functions. A couple of examples of my own use cases that are blocked by current state of accessibility:
Working with long SQL queries is challenging unless you use some form of structural navigation like IndentNav. For my work I have to routinely read and write large SQL queries that are 500+ lines long. My company is using a web SQL client built with a customized version of Monaco, that only exposes a few lines to the screenreader, and therefore, IndentNav doesn't work correctly in it. As a result every time before editing a query I have to copy it to Notepad++, edit there with IndentNav and then copy it back, which is quite cumbersome and error-prone.
Google Docs shortcuts are painful to use. For example there was a document where I had to jump to the second table; I would have to press Control+Shift+Alt+N Control+Shift+Alt+T, then I actually need to release Control, Shift and Alt - otherwise it doesn't work, then press Control+Shift+Alt+N Control+Shift+Alt+T again. It doesn't have to be that painful. So for some users simple use cases are enough and they might not understand my frustration with my use cases. But given the fact that working in a large IT companies often requires these sophisticated use cases, I would argue that NVDA should be the tool to help visually impaired people to get meaningful employment. That is NVDA shouldn't ignore basic use cases in favor of sophisticated ones, but focus on both and not neglect sophisticated ones.
Some people argue that advanced logic should be encoded in application-specific scripts rather than NVDA.

There are multiple issues with this approach:
Many applications are not scriptable: e.g. Jupyter, Google Docs.
Even if an application is scriptable, it requires a significant effort to learn new language, new API in order to write application-specific scripts
Many applications have different sets of available keyboard shortcuts. As an example I ported my IndentNav NVDA add-on to VSCode(link), but there was no way to assign familiar NVDA+Alt+Up/Down keystrokes to it because it doesn't treat Insert key as a possible modifier. As a result muscle memory never has a chance to develop, which reduces efficiency.
Conclusion

I guess with this feature request I would like to hear what NVDA devs and users think about this project proposal.
Do people think the problems I outlined above severe enough to deserve a proper solution?
Is this proposal in general reasonable? crazy? insane?
Are there any good alternatives that I am perhaps missing?
Are there any technical details that I am missing and that need to be thought of before considering this undertaking? In conclusion I wanted to say that it's a shame that while modern consumer computers can process gigabytes of data per second, we, screenreader users, only have access to puny 500 or 35 lines of code at a time. It is also too bad that (as CodeMirror maintainer stated in codemirror/codemirror5#4604) about accessibility on the web:

proper a11y support isn't easy (otherwise everyone would do it! I feel it's our responsibility to make accessibility support easy for the authors of web applications - this way we will eventually live in a world with much more accessible web applications, that are more friendly to the screenreader users.

nvaccess / nvda