Closed Palacee-hun closed 3 years ago
In addition, it appears that the bug is Java version dependent, i was not able to reproduce this with 11.0.7 and NVDA Alpha 21668.
@tyrylu exactly which version of 11.0.7 have you tested with? Because in Adopt Openjdk archive (https://adoptopenjdk.net/archive.html) there are three of them: 11.0.7+10.2 (23 April 2020), 11.0.7+10.1 (17 April 2020) and 11.0.7+10 (15 April 2020). I want to also test with 11.0.7, but with the exact same version you did. For the record the exact Java version I ran Ghidra with is AdoptOpenJdk build of OpenJdk 11.0.9+11 (23 October 2020)
This is the output of java -version:
java version "11.0.7" 2020-04-14 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.7+8-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.7+8-LTS, mixed mode)
Thanks. That's an Oracle build then. Unfortunately I couldn't find that exact version in Oracle archives any more. So I grabbed OpenJdk 11.0.7+10 (15 April 2020) instead. As from Java 11 onwards Oracle and OpenJdk builds should be very similar, that is the closest I could get. Sad news is that I can easily crash NVDA under Ghidra code browser ui with this version also. To exclude further factors in bug hunt: The crash situation I gave the original steps to reproduce is dependent on the last widget active in code browser ui in the last Ghidra session, because Ghidra remembers that and focuses that on reopening. E.g. if it was the program tree then it won't crash with the original steps (that's why I can do so many experiments at all, otherwise it would be even more hopeless). So if you fiddled with Ghidra before (as @tyrylu obviously did) you may not get it immediately.
But there are two important scenarios which unfortunately always, 100 % reliably produce the described crash for me with all three OpenJdk builds I tested with (11.0.7, 11.0.9, 11.0.9.1) independently of where I left off with Ghidra. @tyrylu please check these. I want to make sure that it's not my computer doing something weird (highly unlikely) and also want to know if an Oracle build behaves differently (possible). Thanks much. Crash 1:
Hello, i was at least able to reproduce a NVDA freeze using the code browser panel method, there's not much interesting debug log output, after entering the panel, the log only contains:
DEBUG - NVDAObjects.JAB.JAB._get_states (12:05:43.216) - MainThread (16436):
states: enabled,focusable,visible,showing,opaque
DEBUG - NVDAObjects.JAB.JAB._get_states (12:05:43.218) - MainThread (16436):
states: enabled,focusable,visible,showing
DEBUG - NVDAObjects.JAB.JAB._get_states (12:05:43.219) - MainThread (16436):
states: enabled,focusable,visible,showing,opaque
DEBUG - NVDAObjects.JAB.JAB._get_states (12:05:43.223) - MainThread (16436):
states: enabled,focusable,visible,showing,opaque
DEBUG - NVDAObjects.JAB.JAB._get_states (12:05:43.224) - MainThread (16436):
states: enabled,focusable,visible,showing,opaque
DEBUG - NVDAObjects.JAB.JAB._get_states (12:05:43.226) - MainThread (16436):
states: enabled,focusable,visible,showing,opaque
DEBUG - NVDAObjects.JAB.JAB._get_states (12:05:43.226) - MainThread (16436):
states: enabled,focusable,visible,showing,opaque
DEBUG - NVDAObjects.NVDAObject._get_placeholder (12:05:43.232) - MainThread (16436):
Potential unimplemented child class: <NVDAObjects.JAB.JAB object at 0x007ABD10>
And to add to the weirdness, a NVDA restart does not fix everything, the UI seems to be frozen as well, e. g. it is necessary to kill the java process.
Thanks. That's it, exact same behaviour. I mention for the record that that listing panel is a subclass ofthe Sving class jPanel. It doesn't do anything extraordinary, just draws the text of the listing onto the panel with Java awt calls (mostly drawString) in the custom visual layout and formatting needed for Ghidra. That's why they chose that control type. I mention also that as I began to add in neccessary accessibility properties in my Ghidra copy, they indeed work as documented, and NVDA announces more and more things, but the crashes remain the same. I've never ever experienced anything like this in the 10 years I've been working with this excellent screen reader. This is a real toughie. Now I am almost confident that this is a Java-side problem, and lies somewhere in the deep layers of Java awt implementation. I have some more ideas to test, but it will be no child play to ever uncover its real origin.
cc: @feerrenrut
Have you tested this particular scenario wit JAWS just to make sure that it is a NVDA problem rather than an Java Access Bridge one?
No, because I haven't used Jaws for more than 10 years now and I'd rather keep that away from my computers. Nevertheless this is a good idea, and a logical step in the bug hunt. so those who use Jaws regularly and are willing to try the crash scenarios, please do so and please comment your findings.
With the help of the very powerful Frida instrumentation tool I managed to peek at the debug messages that the Java-side of the access bridge produces. And with those I managed to diagnose this group of various crashes. The root cause lies with a pathological scroll bar. It is a Java swing AccessibleJScrollbar instance with most parameters okay, however its index in accessible parent is reported as -1 (a totally out-of-order value as it should be 0-based), and it has some serious problems with its accessible parenting. The course that leads to the crashes is as follows:
Value changed event: old = 0; new = 1
Version Information: Java virtual machine version: 11.0.7 Access Bridge Java class version: 11.0.7 Access Bridge Java DLL version: 11.0.7 Access Bridge Windows DLL version: 11.0.7
AccessibleContext information:
Name:
Virtual Name:
Description:
Role: scroll bar
Role in en_US locale: scroll bar
States: enabled,focusable,visible,showing,opaque,vertical
States in en_US locale: enabled,focusable,visible,showing,opaque,vertical
Index in parent: -1
Children count: 2
Bounding rectangle: [530, 159, 547, 729]
ERROR: GetAccessibleContextInfo failed for top-level window
There are some fields after this, but they are pointless as the freeze intervened in normal querying.
From this I know that any screen reader will be affected by this, unless it implements extra sanity checks for such objects with "bad lineage". My proposition for a quick fix in NVDA for all this is to avoid getting the parent accessible context of Java objects which have a negative index in accessible parent property either directly or indirectly (e.g. by calling getTopLevelObject on them) and to treat them as top level objects with no parents.
The following two simple fixes in source\JABHandler.py completely fix all these crashes. I extensively tested them, experienced no side-effects with other Java apps and regressions because of these fixes are very very improbable:
def internal_getWindowHandleFromAccContext(vmID,accContext):
try:
# 12045: some broken accessible objects contradict themselves by reporting no parent
# (indexInParent = -1) and also reporting themselves as parents
# do not try to call getTopLevelObject on such objects, as this would create a Java-side endless
# loop, but get native window handle from them directly
info=AccessibleContextInfo()
bridgeDll.getAccessibleContextInfo(vmID,accContext,byref(info))
if info.indexInParent < 0:
return bridgeDll.getHWNDFromAccessibleContext(vmID,accContext)
topAC=bridgeDll.getTopLevelObject(vmID,accContext)
...
def getAccessibleParentFromContext(self):
accContext=bridgeDll.getAccessibleParentFromContext(self.vmID,self.accContext)
# 12045: some broken objects report themselves as their own parent
# treat such objects as having no parent to prevent recovering from pointless
# recursions, broken object navigation etc.
if accContext and not bridgeDll.isSameObject(self.vmID,self.accContext,accContext):
return self.__class__(self.hwnd,self.vmID,accContext)
else:
return None
With most elaborate work I managed to track down the very core of these freezes. The NVDA workaround code I had shared above was very much needed for that as a frozen screen reader is not particularly useful for serious debugging purposes. This was a Ghidra bug. Technically one tiny method overrode AccessibleContext retrieval for a vertical scrollbar subclass with a specific visual appearance, but it did it with a mixed-up delegation pattern and that caused that scroll bar to report its parent erroneously as itself. Removing that single method fixed this all. The fix has been committed to Ghidra master. Nevertheless putting my workaround code into NVDA would do no harm and would only bring benefits: protection against such edge-cases.
Nevertheless putting my workaround code into NVDA would do no harm and would only bring benefits: protection against such edge-cases.
Have you considered to submit a pr yourself?
No and I won't do that myself. Reasons are simple: I have absolutely no experience with that. I've never ever made a pull request. I've read through many tutorials, but somehow I couldn't understand the Github "bureaucracy" with pull requests and why so many steps are needed just to put a very tiny fix in place. I am very strong at tracking down and fixing very hard to find bugs (like this) even in extreme conditions (like when the screen reader crashes) and that is a huge contribution. There are people here who make pull requests daily. I don't want to mess with NVDA with my first ever pull request.
It's important that someone who truly understands the problem and required change creates the Pull Request, since it is the mechanism for communicating the intent, reasoning and approach to the rest of the developers on the project.
I'd be happy to provide guidance with creating the PR if you would like. The process is very safe, and the community is friendly.
CC'ing @Zersiax because didn't you recently also experiment/fight with some JAVA-related NVDA crashes and freezes? Are they related maybe?
No, that was the only Java-related crash/freeze I've ever encountered with NVDA during the more than 10 years since I use NVDA exclusively.
@MarcoZehe you may be onto something there. My issue was with a control that wasn't supposed to be clickable receiving an NVDA+enter , which in Java apps I believe sends a left click. I never dug into it as deeply as the OP has, but it is very possible that that control had a negative index in some sense, or was the child of a parent that did. As for fixing this, while I agree with the fact that a PR should be submitted by the original poster, I also recognize this patch has been sitting here in the comments for the last 2 months. Has a PR been created for this issue? If not, how can we move forward in getting the supplied patch tested/integrated?
In the interest of the NVDA community I kindly, but definitely ask that someone who has a daily routine of creating pull requests submit one with my clearly commented patch above if it seems useful, because:
It is not about rigid rules, as you say. It is a practical matter that very few other people can test your fixes, or know what to look for, as well as you do; and I suspect you are one of a very small number currently using this software. Keep in mind that generating a pull request does not do anything to the NVDA source. It will be reviewed by one or more core developers before anything happens with it. You can't do any harm here. An executable will be generated in which your fix can be tested (by you and others).
The process of making a pull request, with all your experience, should be simple. It only looks difficult and complex until you try it. Here are some steps you can follow.
https://github.com/nvaccess/nvda
git clone https://github.com/Palacee-hun/nvda
. You can also use the following, if you have uploaded an ssh key to GitHub: git clone ssh://git@github.com/Palacee-hun/nvda
.cd nvda
git submodule update --init
git remote add nvaccess https://github.com/nvaccess/nvda
git fetch nvaccess
git checkout master
git branch -u nvaccess/master
git pull
git checkout -b javaFix
git commit -a
git push -u origin/javaFix
. If you didn't use "javaFix" as your branch name, change it here.That should have created a pull request. We can guide you further from there.
Note: the first several steps of the above (before 11) are taken more or less directly from the NVDA Wiki page on Contributing.
Also, in your "patch" code (not an actual appliable patch), you need to indent your comment lines to the same level as your code. In other words, the comments in the first function need to be indented one tab, and the comments in the second function need to be indented two tabs.
Great guide for making a pull request, thanks much, I may find good use of that info somewhen ... Definitely not appetizing though lol Nevertheless totally independent of this I've decided to close this issue. As it turned out this was not a NVDA bug, but a Java bug in Ghidra. Ghidra folks fixed that after my diagnosis, thorough testing of the fix candidate and finalizing my fix proposal. The fix is in Ghidra 10.0 beta according to changelog and that version is already downloadable. So this issue is properly dealt with and gone. All other Java issues with NVDA should be discussed in separate issues.
Ghidra is a quite powerful free and open source, mostly Java-based reverse engineering / decompiler toolbox. It's ui is Swing-based, with a custom docking widget set on top of it, but despite this, most controls are accessible. The two most important ones(the listing and decompiler areas) are not however. There's a ticket over there at Ghidra github page about that with little to no progress. I would volunteer to work on that and facilitate speedier resolution, but the crashes I report here prevent serious work in this respect. The issue is proven to be screen-reader specific, see below why. That's why I see no point in ticketing that at Ghidra issues, I bring it here instead. Ghidra may prove valuable in job settings, maybe even more in the future, for IT-security people. This augments the importance of this issue.
Steps to reproduce:
Actual behavior:
NVDA stops responding to anything and total silence takes place until you restart it. After hitting the restart keyboard shortcut NVDA quickly comes up talking again. Examine Ghidra windows at that point. There will be two of them: Ghidra main window and the code browser. You will find no objects in them with object nav except the close button. But it won't close if you activate that. You need to manually kill Ghidra. After killing it, it can be restarted with no problems.
Expected behavior:
No NVDA crash of course, this is way too troublesome.
System configuration
NVDA installed/portable/running from source:
installed
NVDA version:
2020.3
Windows version:
Windows 10 64-bit 19041.746
Name and version of other software in use when reproducing the issue:
OpenJDK 11.0.9, Ghidra 9.2.1
Other information about your system:
4 gig RAM, 4 cores, very good memory statistics, very snappy and responsive system
Other questions
Does the issue still occur after restarting your computer?
yes
Have you tried any other versions of NVDA? If so, please report their behaviors.
no
If addons are disabled, is your problem still occuring?
haven't tried, but I don't think this is applicable here as my installed addons don't touch Java at all
Did you try to run the COM registry fixing tool in NVDA menu / tools?
not applicable
Further remarks
This is not the only way to trigger the exact same crash in the code browser window. Note that object nav works okay and does not bring NVDA down, only certain interactions with the controls. Other crash sources are arrowing down in the symbol tree and in the listing panel (currently inaccessible). Multiple arrow down keys are needed to bring NVDA to its knees in them, but bringing up the menu instantly "slaughters" NVDA.
How do I know that this is screen reader specific?
There is a very hackish way to stop and restart NVDA Java support at will while running. In NVDA Python console do import JABHandler JABHandler.terminate() to stop Java support and JABHandler.initialize() to reenable it. While off you can tab and arrow around in code browser window at will, it won't bring anything down. With java-ferret tool I tested that all controls treat and fire events appropriately. After reenabling crash may be few keystrokes away again.
Bug hunt efforts and my hints at diagnosis
This is very tough. No exceptions, no logs whatsoever. With frida instrumentation tool I dumped the interaction between NVDA and windowsaccessbridge-32.dll multiple times in normal and crashy situations. No apparently visible hints or problem spots or erroneous patterns were identified this way. I saw that the last call to the dll was either getTopLevelObject or releaseObject, the latter being more frequent. I dug deeply into NVDA Java support code and also the ui code of Ghidra, but I could not identify anything definite. My hint at diagnosis is that something makes the Java-side of access bridge to stop responding to events sent by windowsaccessbridge-32.dll and that causes the dll to block in waiting for response and that cripples NVDA. As NVDA restart does not cure the situation on Java side (see behaviour), that points to this scenario.