Closed pwoolcoc closed 3 years ago
hello there, one of the yggdrasil founders here!
At this point in time, yggdrasil doesn't accept contributors in the accessibility related parts of the code, however there are other things to do. This is because the accessibility technology in linux is very much undocumented, almost all the documents I could find are outdated and their links give me 404 errors, so we do things only by reading, more often than not, the C code of atspi and such to figure out how the events and all that work. Additionally, we also read the source code of orca, the existing screen reader for linux, a very messy codebase to work with honestly. So, since we won't want to put you through any of that unless you feel particularely adventurous, I don't know what accessibility related work you can do, however there are still some C libraries to bind if you're interested in doing that, as well as feedback on the design document and the way we want to do things is appreciated. If you can give us tips on how to improve the code quality of even the prototype, regardless if it will be thrown away eventually when we decide enough prototyping has been done, is also very much appreciated.
For starters, there are afew enums and some callback functions in the C speech-dispatcher library that we didn't bind and didn't make available to rust, especially those in the async speech dispatcher API. It would be helpful if you could help us finish that crate for good. The documentation for that is kinda outdated as well, we had to read libspeechd headers to understand why the app is crashing, however it's certainly easier to do than adding features that use the very much undocumented accessibility stack.
Speaking of async, speech dispatcher has an async mode, though we only implemented a syncronous API for simplicity. Now we're work directly with dbus asyncronously and with tokio, we think it's high time we make use of the async mode. In stead of callbacks that the user has to pass in, we thought all that can be incapsulated in an async event stream, even though the C part gives events through callbacks as any C library would. Perhaps you can contribute there too? If you think you can do refactorings that would make things more rusty in that binding, feel free to try.
Those are pretty easy codebases to work with, and though we didn't make issues for them and we could write those wrappers our selfs when we really need them, those would really help us achieve our goal faster.
again, thanks very much for your offer, any contribution is appreciated.
accessibility technology in linux is very much undocumented
Did you see https://wiki.freedesktop.org/www/Accessibility/ ?
The documentation for that is kinda outdated as well, we had to read libspeechd headers to understand why the app is crashing
Reports / pull requests on that would be more than welcome too.
@sthibaul about the first link, yes, I saw that, or a very similar page anyway, my browser shows it as visited. Yes, we know there's documentation on how to make client apps accessible, however we need more on...well...how does this fit in with screen readers, atspi servers...hmm now, about parts that return 404, whenever I want to find documentation specific to this side of things, I follow a link that would say something like gnome atspi guide. What do I get, yup, a 404. Is it because that document is too old and irrelevant for the current times? if so, can that fact be written somewhere? whenever I want to see why orca does a certain thing, for example the weird hack it does by registering keystrokes to move the edit caret around in the edit box by the specified amount(characters, words, lines, etc), in stead of letting the system do it and then just read the difference between previous offset and next offset, I have to read the dbus xml spec or the man pages to discover what functions are available, just to find out that, for example, the text interface exposes a method to get the text beginning at an offset that you get from the detail_1 part of the caret moved event and an enum specifying how much you want to get, the sentence, line, word part, that's why, not due to some orca peculiar design decision. A more logical one would be to be able to specify the offset and the end as integers, let the sr decide what it wants to get, make it more flexible, unless this already exists but haven't seen it in the docs we currently have. However here's the thing, there's no tutorial on how to do most of this, orca doesn't have much comments. This is what we're going to do, once we are finished with the prototypes, we're going to try to document the problems we find as well as the solutions we came up with during the development of yggdrasil. Also, we will document the basics of atspi from a screen readers perspective and some things like that, since I couldn't find a halfway comprehensive article on how it all fits together, as well as all the workarounds orca had to do to interpret the information in an actually useful way, assuming we will be able to understand orca source code, there's some progress on that front though. For example, now yggdrasil reads focused controlls more or less, however it doesn't read the windows as I switch between them with alt+tab, untill I'm focused on the window. This is in a way a focused element, right? I tryed subscribing to other window events, but nothing worked, so DK what to do now. In an ideal case, something like this should be documented somewhere, at least in orca code if nowhere else. so yeah, if I'm not mistaken, you are one of the people working at linux accessibility, right? in any case, I recognise the merits linux accessibility has, I acknowledge it opened the gate to us using linux desktop efficiently. However, modernisation has to happen somehow. First, a modern screen reader, yggdrasil, one way or another. Then, documenting the process of making such a thing. Once the limits of the current accessibility stack has been reached, who knows, probably an atspi3 is in order. about contributing, we would love to, however we won't finish yggdrasil ever in that case, there's so much to contribute to, especially documentation. A regeneration of the speech dispatcher docs would fix that particular issue I think, however maybe another will arise. We will certainly contribute to things in the future, however a bit later when we have more of our knowledge together, when we become more experienced with all that atspi entails.
@albertotirla, the alt tab thing is actually quite common among screen readers. Haven't played with Linux a ton, but at least on Windows, invoking alt+tab sets the foreground window to Task Switcher, and shows all the windows visually as you alt tab through them. This sends a totally different accessibility event that doesn't have anything to do with focus.
since this question has been answered with enough clarity I believe, I'm closing this issue. @pwoolcoc, if you can contribute to any of the above mentioned things, it's much appreciated.
@albertotirla :
how does this fit in with screen readers, atspi servers
Most screen-reader-side information would be available in the libatspi (https://www.manpagez.com/html/libatspi/) and pyatspi2 documentations (pydoc3 pyatspi
), for each interface you'll have the details of the methods. The low-level bits are in the interface definitions in at-spi2-core in xml/
Is it because that document is too old and irrelevant for the current times?
Most probably no, it's probably just because gnome moves things around on its websites.
the weird hack
Usually hacks are not a good thing. People used to introduce them in orca, but nowadays Joanmarie says that it really is the application that should be fixed.
A more logical one would be to be able to specify the offset and the end as integers, let the sr decide what it wants to get, make it more flexible, unless this already exists but haven't seen it in the docs we currently have
I'm not sure to understand what you meant.
there's no tutorial on how to do most of this
Well, yes, because of lack of manpower. So either people spend time on writing tutorials etc. which always quickly get outdated, or they make the whole thing at least work. Atspi does have some documentation. I wouldn't say it's perfect, but it's a matter of people helping with improving it.
orca doesn't have much comments
That's the eternal criticism that people often have. And then they write code which doesn't have much comments either...
This is what we're going to do, once we are finished with the prototypes, we're going to try to document the problems we find as well as the solutions we came up with during the development of yggdrasil
Cool! I would however say not to wait. It's when you are solving the problem that you should write the documentation that explains how you understood how to solve it. Once you have understood the thing and let some time pass, you'll have forgotten the list of things that you needed to understand, and that you now can't unlearn.
all the workarounds orca had to do to interpret the information in an actually useful way
Most often they are not supposed to be there, so better fix them :) We want less workarounds, not replicate them in various screen readers.
it doesn't read the windows as I switch between them with alt+tab, untill I'm focused on the window
That's indeed a corner case that poses problem on windows too indeed.
This is in a way a focused element, right?
Is your task switcher accessible, actually? IIRC we had seen some task switchers that were actually not accessible at all.
you are one of the people working at linux accessibility, right?
Yes.
modernisation has to happen somehow
"modernisation" is a too vague term, often only related to hypeness, which is quite often not actually technically sound.
a modern screen reader
What "modern" would mean here? Really, most often when I see this adjective the background is actually quite vague.
Yes, writing a screen reader in C is unreasonable. Writing it in Python is, however, we can see how NVDA is faring.
documenting the process of making such a thing
That's not a "modern" thing :)
the limits of the current accessibility stack has been reached
It can be extended.
That being said, there are indeed ground reasons in atspi2 defects. But they are not out of lack of some kind of "modernity", but just because at the time we didn't know how e.g. thunderbird's thousands-mail-long interface would behave, and thus the corresponding notification flurry issues.
there's so much to contribute to
Yes, that's why I'm calling for joining efforts.
A regeneration of the speech dispatcher docs would fix that particular issue I think
No, the speech dispatcher documentation is not generated, the .texi file is the source. I have updated some of it in ae12effa5713c7c875361c18b5436976bbd6d4e0, but that would probably need some review to check that my quick look didn't miss something.
For clarity, I will reiterate that at this time, our (mine and @albertotirla's focus is on binding at-spi in Rust, one way or another, which we're currently doing by directly communicating over DBus, not using libatspi. We tried libatspi first, but we found that gir really isn't up to the task of fully generating these bindings, namely because of GArray
s and the lack of auto-generated accessors to struct fields. We probably could've made it work, but using the dbus crate seems far more idiumatic. Does anybody see any issues with communicating with the registryd over DBus like this directly? It seems to me as though libatspi only lightly wraps the DBus interface, to make it glib-esc.
As already mentioned, I don't think I can stress enough how important I think writing documentation is right now. We don't want to repeat the mistakes of years past which I feel caused many people to give up on contributing to the Linux accessibility stack. To this end, I've started work on some introductery material to DBus, to at-spi, and I plan to make many examples, including in Rust, since that will help potential Yggdrasil contributors get started. This isn't public, but will eventually be going up on the Yggdrasil site.
Most screen-reader-side information would be available in the libatspi (https://www.manpagez.com/html/libatspi/) and pyatspi2 documentations (
pydoc3 pyatspi
), for each interface you'll have the details of the methods. The low-level bits are in the interface definitions in at-spi2-core in xml/
Good resources, thanks for sharing! I've also had success before getting some documentation out of the GObject Introspection files.
Usually hacks are not a good thing. People used to introduce them in orca, but nowadays Joanmarie says that it really is the application that should be fixed.
I absolutely agree, and that's one of the things we'd like to uncover and fix while building Yggdrasil. If we find a bug in the toolkit, or the registryd, it needs to be fixed upstream, not hacked around. This will benefit Orca users too, so this can only be good.
Well, yes, because of lack of manpower. So either people spend time on writing tutorials etc. which always quickly get outdated, or they make the whole thing at least work. Atspi does have some documentation. I wouldn't say it's perfect, but it's a matter of people helping with improving it.
Agreeed, and we certainly plan to help with this in any way we can.
That's the eternal criticism that people often have. And then they write code which doesn't have much comments either...
I fully agree, and I'm determined to not let this happen again. It's a much too complicated problem to solve for it to ever be 100% perfect, but we'd certainly like to give it our best shot. A mixture of in-code comments, which are more likely to be kept up-to-date, and out-of-code materials that provide higher-level overviews, as well as point people to the relevent source files, seems to me as if it would go a long way to make the inner workings of the a11y stack ... more accessible.
Cool! I would however say not to wait. It's when you are solving the problem that you should write the documentation that explains how you understood how to solve it. Once you have understood the thing and let some time pass, you'll have forgotten the list of things that you needed to understand, and that you now can't unlearn.
Also agreed, we need to get more info out of the heads of developers and onto paper, or at least e-paper :-)
Most often they are not supposed to be there, so better fix them :) We want less workarounds, not replicate them in various screen readers.
Exactly, and I hope that starting a new Linux a11y project can help re-invigourate this topic, and benefit everybody. I've read comments on the Linux a11y mailing list to the effect of "Why are you not just contributing to Orca", and besides the barrier to entry, one of the reasons is that I kinda feel like examining the entire stack from the perspective of making a new screen reader may actually be more beneficial to the stack, because it will uncover issues that have been taken for granted for years now. That's my hope at least.
"modernisation" is a too vague term, often only related to hypeness, which is quite often not actually technically sound.
I think in our case, the innovation we're trying to push for is in the addon system. Not just bringing screen reader addons to Linux, but also by making Yggdrasil as modular as possible. We plan to dog-food our own addon API by writing features that would normally be part of the screen reader's core with it, such as OCR, sound icons, etc. In effect, the core becomes just a handler for keyboard input and the accessible objects and at-spi events (the input), and the controller of speech, braille, etc (the output). In a sense, we're taking inspiration from modular kernels like Linux (keep the core smaller and simpler, delligate additional functionality to modules).
Of course, this introduces some challenges of its own, like developing a well thaught out addon API, and making sure addons can't bring down the screen reader if possible, but hopefully Rust will help us manage this due to its memory safety and fearless concurrency.
Yes, writing a screen reader in C is unreasonable. Writing it in Python is, however, we can see how NVDA is faring.
By this do you mean it is or isn't unreasonable? NVDA is almost certainly the most successful open-source screen reader project, but they've definitely had issues with the Python 2 -> 3 transition, and breaking compatibility. I don't think Python is the issue with Orca, but if we're starting again anyway, I think it makes sense to use a more high performance language so long as it doesn't introduce huge design deficiencies of its own, which I don't think Rust will do.
the limits of the current accessibility stack has been reached
It can be extended.
That being said, there are indeed ground reasons in atspi2 defects. But they are not out of lack of some kind of "modernity", but just because at the time we didn't know how e.g. thunderbird's thousands-mail-long interface would behave, and thus the corresponding notification flurry issues.
Are there any ideas floating around of how to fix these issues? I'm well aware of their existence, but haven't seen too much discussion about solutions (probably just haven't been looking hard enough). How does UI Automation solve this on Windows, or whatever Apple's accessibility API is called? Could some of these big lists of accessible objects be paginated? Could backpressure could be used to take the strain off the event handlers?
Does anybody see any issues with communicating with the registryd over DBus like this directly?
That should be fine, qt does this, and gtk4 will do this as well. Ideally you'd be generating from the .xml files. That's what Emmanuele is after for instance.
I've started work on some introductery material to DBus, to at-spi, and I plan to make many examples, including in Rust, since that will help potential Yggdrasil contributors get started. This isn't public, but will eventually be going up on the Yggdrasil site.
Great! When it becomes available, please add a link on the https://wiki.freedesktop.org/www/Accessibility/ wiki page, so this gets advertised.
I kinda feel like examining the entire stack from the perspective of making a new screen reader may actually be more beneficial to the stack, because it will uncover issues that have been taken for granted for years now.
Possibly not for granted, but at least papered over by the implementation, and thus forgotten since then. So undigging the issues and working on them could be a good thing indeed.
Writing it in Python is, however, we can see how NVDA is faring. By this do you mean it is or isn't unreasonable?
The punctuation was ambiguous indeed :) I meant: “Writing it in Python is, however : we can see how NVDA is faring”.
NVDA is almost certainly the most successful open-source screen reader project
Agreed!
Concerning Rust, my main concern would be whether people can learn it easily enough to be able to make contributions quickly.
Are there any ideas floating around of how to fix these issues?
I haven't followed this.
Could some of these big lists of accessible objects be paginated?
That's how libreoffice does it for the document for instance, yes: it only exposes the really visible objects.
Could backpressure could be used to take the strain off the event handlers?
I'm not sure to understand what you mean by "backpressure"?
because of this issue in particular, but also anticipating others of its kind, I enabled github discussions. So, for anything that's not actually an issue, I recommend we use that, especially since this issue is closed for now. however, I'm going to answer afew points of @sthibaul here, then I will make this into a discussion.
Most screen-reader-side information would be available in the libatspi (https://www.manpagez.com/html/libatspi/) and pyatspi2 documentations (pydoc3 pyatspi), for each interface you'll have the details of the methods. The low-level bits are in the interface definitions in at-spi2-core in xml/
about that documentation from manpages we know, otherwise we won't have arrived as far as the prototype, I think it's in the resources section of the design doc, as well as some examples repository. Those were invaluable resources in our journey, we would have given up a long time ago without them.
Most probably no, it's probably just because gnome moves things around on its websites.
OK, good to know then. Is that document available somewhere else? does it contain things we can't get from the man pages, some explanations perhaps?
Usually hacks are not a good thing. People used to introduce them in orca, but nowadays Joanmarie says that it really is the application that should be fixed.
I totally agree with that stanse, we'll try to make it work, then make it work better as the saying goes. What this means in practice is probably we will adopt these hacks only if we can't think of anything else or if nothing else exists currently. Even so, we will try to keep them around for as short a time as possible, indeed we won't have issues going further upstream if the cause of the hack originates there.
That's the eternal criticism that people often have. And then they write code which doesn't have much comments either...
well, in the prototype, we're throwing everything together to see if it works and how things interact with the system, if rust makes a difference, that's why we don't have comments in that, but we will in the real thing, after the design overhall. Fyi, actually, rust really makes a difference, for now that is.
Cool! I would however say not to wait. It's when you are solving the problem that you should write the documentation that explains how you understood how to solve it. Once you have understood the thing and let some time pass, you'll have forgotten the list of things that you needed to understand, and that you now can't unlearn.
fare point. I still DK what to do on that front, code needs to be written that's true, while the docs can't really be written in the meantime. On the other side, you're right, we'll probably forget how we solved the issue in the first place. Ironically, I think this is where code comments might come in handy.
Is your task switcher accessible, actually? IIRC we had seen some task switchers that were actually not accessible at all.
well, probably yes, as orca reads when I do that. Plus, I'm using ubuntu, so it should be farely accessible...at least that part is. About that, I think I'm partway to a sollution. So, through monitoring of the atspi bus, I discovered that when a window switches, or even when I am in the switcher, a parent property changed sygnal is fired, plus when you land on that window, the toolkit sends events about each child of that window being added. Perhaps that could be a heuristic to detect we're in the task switcher? do you know how orca handles this? I can't make sense of where in the code that is handled, might it be that this is another hack?
What "modern" would mean here? Really, most often when I see this adjective the background is actually quite vague.
Performance and functionality wise, think of the comparison between csr(commentary screenreader) and tb(talkback). However we tb users don't want to admit it, csr is so much faster, exactly because most of the performance-critical tasks are written in c++, it's reality.
Yes, writing a screen reader in C is unreasonable. Writing it in Python is, however, we can see how NVDA is faring.
while I agree with you in the writing a screen reader in C part, I can't totally agree on python. Indeed nvda is written in python, however it uses C libs for most things, which speed it up a bit. Plus, because of the gil and all that, you can't have proper multithreading, so it doesn't use the full potential of a multicore machine. Plus, remember that python itself has an interpreter, so that means a larger runtime, a larger part of memory unusable by nvda or other programs, something that shows in nvda, especially when the computer has a huge task to do and nvda is ram restricted, it lags very badly, same for orca. Python shouldn't be the defacto standard to writing applications tightly integrated with the system, such as screen readers. We need maximal performance out of them, not the potential for a very easily readable code that, as we saw, could also quickly turn into a mess for one reason or another. If rust didn't exist, I would probably jump to c++ and begin to write it that way. yet another good example where writing it in a compiled language actually works in the favour of the program is zdsr, another chinese screenreader. This time, the comparison is not so harsh, however because the entire sr is written in c++ with optimisations in mind, it's faster and less memory consuming than nvda will ever be. I know, the screen readers I mentioned are from china, and therefore might bring suspicion in some peoples minds. For me, it's not important where it comes from as long as it performs its job well. I just picked them as examples of performance and superior functionality just because of a new perspective, having started it from the ground up. See, sometimes reinventing the wheel is beneficial, for the author of the project, for the people using it, even for the accessibility stack itself. We indeed have plans to try to modernise and extend atspi2 in the future to make it as good as it could be, however we must go at it from the top, the screen reader first, then the rest in case of limitations.
Hello, I apologize for opening an issue for this but I was having trouble finding a way to contact you. I'm an experienced Rust developer and I'd love to help out with this project. Are you currently accepting contributors?