Focus navigation and other features for accessibility?

ethindp commented 3 years ago

Version/Branch of Dear ImGui:

Version: latest Branch: master

Back-end/Renderer/Compiler/OS

Back-ends: imgui_impl_opengl3.cpp Compiler: XXX Operating System: XXX

My Issue/Question:

I am writing an application for a university project and I'm considering using IMGUI. However, I am also visually impaired, and therefore rely on a screen reader and speech synthesis to interact with computer programs. A screen reader is not necessary, however, if the application I'm using has built-in TTS support -- then that application becomes, for the time being, the screen reader; this is called "self-voicing".

In order for a screen reader of any kind, even a self-voicing application, to be considered "reasonably accessible" to people with disabilities, some invariants need to hold:

Tab-based focus navigation, or the ability to intercept focus events in general, need to be available
Widgets such as tables, menu bars, etc., need to fire events when focus within them changes. E.g.: if a menu item is selected, an event needs to fire to notify the application that synthesis should occur, or there needs to be a way of detecting this change. Widgets such as edit boxes need to notify the application in some way when their content is updated.
Finally, an application needs a way of determining which widget generated either of the above two conditions.

The above three invariants are minimum baseline requirements for an application to be made accessible with speech synthesis in a way that allows comfortable user interactivity. Does IMGUI allow these changes to be detected in some manner? I could not find it within the demo or examples. I did find an issue about tab navigation, but nothing about this particular problem. However, I do not want to make a false assumption when I may have overlooked something.

ocornut commented 3 years ago

Hello Ethin,

Thanks for your detailed message.

Tab-based focus navigation

Currently tabbing only go through certain fields (those which can be turned into text input: InputText, SliderFloat, DragFloat etc..). There is a plan to rework this so it goes through all fields (some work has been done as recently as https://github.com/ocornut/imgui/issues/4079#issuecomment-830407499, I had more work done but never finished it, faced some technical issues back a few years ago when I tried). Anyhow I think it will eventually happen. Note that the directional navigation model used for gamepad and keyboard controls allow navigating through all widgets using arrow keys (or gamepad d-pad).

Intercepting focus events it not possible yet but should be possible to add into the codebase. We would need to know/understand exactly what data would be useful to you and they would be pipped to the screen reader in order to understand how to design the platform/backend API to provide this info. If you were to be keen to investigate dear imgui further, it would be reasonable to work together on drafting something.

Widgets such as tables, menu bars, etc., need to fire events when focus within them changes. Finally, an application needs a way of determining which widget generated either of the above two conditions.

From dear imgui point of view, it won't be widgets-specific (as in, not tied to "tables, menu bars etc.). so it seems like it all get bundled into the same "intercept focus events" ? The main question is what information you would need, and how will the screen reader (which I assume reads from the image?) would know where and what to read from on focus change? When you say "which widget generated [..] the event" how would this be specified? A string would be too ambiguous and not all widget have a string-based id.

Widgets such as edit boxes need to notify the application in some way when their content is updated.

Ditto, would be good to know precisely what information you would need to leverage ithis.

Computerfan23 commented 1 year ago

Hi there! I know this has been open for over a year now. I was wondering if anything has happened to the improvement for imgui or how you spell that. I am also a blind use who uses a screen reader and speech synthesizer to read text on screen. When i try to view gui elements, my speech synthesizer can't read any text on screen. I try tabbing around, but hear nothing at all. Would it be possible to make it like a json file? Or some sort of web interface? I may think that is easier to navigate with screen reader, but that is only some suggestions on my part.

ethindp commented 1 year ago

I am wondering if any improvement has occurred. I shall attempt to address the notes of @ocornut:

You would need widget-specific focus events. A "universal" focus event simply won't work. Oh, it will tell you what top-level widget (e.g. A table) fired it, for example, but we need more granular information. Here are a few examples:
- List boxes: events need to be fied when the selection changes. The information that should be included should either be the index of the item that was selected, or the text of that item. (Index may be a better option here.)
- Edit boxes: text updated (character deleted, entered), and, really, any other form of input (this is so that we can add special bits of functionality like character/word/paragraph navigation, selection, command history, ...). Obviously it should be possible to toggle whether tabbing out of the field should be allowed or not. (You might not want this for, e.g., A command console which has some form of tab-based autocomplete.)
- Tree views: events should include when an item is expanded or collapsed and when an item is selected. If the tree view allows for checking or unchecking items, this should be included.
- Combo boxes: item selection changed. If the combo box is editable, then we should be able to get text input events as well.
- Menus: menu item selection changed, menu item checked/unchecked, menu item activated.
- Menu bars: menu bar item selection changed, sub-menu expanded/collapsed, plus menu specific events.
The screen reader wouldn't read from the visual image that ImGUI draws. Rather, the screen reader (or, more generally, the "accessibility client") would be notified of text to speak by the end-user application unless ImGUI integrated with OS-level accessibility interfaces. In the former case, client software would consume events like focus changed/lost events, widget-specific events, etc., and would take whatever action it deems necessary, such as telling the accessibility client to announce something. In the latter case, this would be unnecessary; the operating system would take care of this by the virtue of ImGUI being an "accessibility server" (i.e. ImGUI would provide information to the operating system about various pieces of information per widget, such as the name of the widget, it's role (top-level window, child window, button, check box, combo box, ...), an "accessibility description" (which would describe, in further detail, the purpose of the widget, or what activating it will do), etc.) and the screen reader would then consume this information automatically. No client software event handling would need to be done other than taking actions upon widget activation, or storing text in memory buffers, etc.
Determining which widget generated the event would be the simplest: just pass it as a pointer or reference into the event handler callback. Provide functions (setters/getters) for acquiring information for the widget, and let the callback handle that as it sees fit.

In general, the more information you can provide when it comes to widget events and information, the better. It is perfectly acceptable if the UI library is extremely verbose in terms of the data that client software can acquire about a widget. It is up to client software to determine what information it wants to use and discard. Of course, none of this matters if you want to implement OS accessibility interfaces, but this is most certainly not an easy process, particularly given that OSes like Linux have no de facto accessibility interface to begin with, and the ones that you might assume are de facto don't have very good documentation. Windows would be easier, if only because you only need to integrate UI automation and your good to go. MacOS will pose a problem, however; Apple is notoriously tight-lipped about their accessibility interfaces, and I'm unsure if it's even documented anywhere. I hope that I was able to answer your questions though.

Computerfan23 commented 8 months ago

Hi! I just was refered to this issue. I am playing a pc port of ocarina of time and there is a menubar you bring up by pressing f1. This menu contains a lot of dropdown menus and lists. This menubar is not accessible since i get no text to speech output at all. Any thing mentioned here that may be a solution for this?

ocornut commented 8 months ago

Thanks everyone for your input here. Quick update first, since 1.89.4 (March 2023), tabbing can cycle through every items, so that's part of sorted out.

From Dear ImGui point of view, implementing a system that submit contents on specific events/actions is possible, but there are likely going to be several edge cases where providing accurate/precise enough information may be difficult due to lack of context. As in, dear imgui itself rarely have enough of context at its disposal, and it may occasionally requires non-trivial digging or systems to retrieve the desirable information. For example: combo boxes, list boxes, menus are more or less general windows where any type of items can be submitted. From inside a specific item code, they generally don't care or know where they are. But that information could be retrieved somehow.

(There is also generally pressure that by nature everything in dear imgui needs to be implemented in a very optimal manner (in terms of CPU usage), but I am slightly less worried about that because I can always help to optimize the code. But it bounds to note this is a severe requirements that requires non-trivial coding discipline. I'll help.)

I think we'd need to focus on first steps. Not being a user of screen readers I am unlikely to be able to do it myself, but if someones wants to fork and do experiments I can chat with them and provide assistance to move forward.

Computerfan23 commented 8 months ago

Hi! What is Dear ImGui? Was a bit confused. I am a frequent user of screen readers in every day life. I use jaws and nvda. Although i do not am familiar with coding or programming, i have used github a bit. Also tried looking at gui elements. I am a legally blind player. I rely on using screen reader and accessibility.

ocornut commented 8 months ago

Dear ImGui is a tech/software library used to display UI elements such as buttons, sliders etc. It is a rather opinionated and unusual piece of tech and was designed to easily create debug tools and internal development tools.

Nowadays people have been increasingly using it for user-facing tools and are expecting new things out of it that are largely out of its initial scope, but make sense with the increased adoption and user-base.

We don't have any simple/easy answer to your question other than stating that screen readers (unless some are relying on OCR?) currently won't work with GUI created with dear imgui. But discussions and work here may lead to improvements on that front. I would be happy to help an experienced programmer who want to tackle this and help move the needle forward.

Computerfan23 commented 8 months ago

I see. The only screen reader that i know have OCR is nvda. I have tried to use that in some cases, but not had the best experiences. Often it gives me a content is not visible message or i get one screen read at a time. So it has been clunky for me to use. So there is not much for me to do currently regarding gui?

ocornut / imgui

Focus navigation and other features for accessibility? #4122

Tab-based focus navigation