ryobg / JContainers

JSON-based data structures for Papyrus - the TESV Skyrim SE scripting language
MIT License
107 stars 23 forks source link

AH Hotkeys CTD with JContainers64 #15

Closed RealAntithesis closed 6 years ago

RealAntithesis commented 6 years ago

Hi, I've been testing JContainers64 with a ported SSE version of AH Hotkeys. Installing AH Hotkeys caused a CTD on reloading a save game after AH Hotkeys was installed. I've done some testing by commenting out code that runs on startup, and isolated at least one of the causes down to the JArray.retain function, as follows:

string property JCTempTag = "AH_Hotkeys_TempTag" autoreadonly
int property objTempArray auto hidden

objTempArray = JValue.retain(JArray.object(), JCTempTag)

The JC functions that run prior to this point appear to be fine. Creating a JArray of strings in a test plugin and running this multiple times appeared to be fine over multiple savegame reloads, but the CTDs appear as soon as the above lines are added. There may be other causes of the CTD as well since the fully-uncommented AH Hotkeys code results in an immediate CTD after only the first savegame reload (not merely the 2nd or 3rd reload).

I can attach my working files if that will help.

ryobg commented 6 years ago

Hi @RealAntithesis,

I'm not familiar with that plugin, so one thing which you may check on your end is whether it tries to load some binary data from the previous 32-bit JC. Deserializing such data is known to throw an exception.

Also that nice piece of code seems as a trigger of previously occurring issue and it will be more helpful to find its code. Saying that, I will check out later this function for possible leads.

Thank you!

RealAntithesis commented 6 years ago

Thanks ryobg. I don't think there should be any binary data being read in - this is a fresh install. Besides, the test script I'm using is fairly simple and starts ctding as soon as the above code is added. I'll be doing some more testing over the week. Thanks for looking at this.

RealAntithesis commented 6 years ago

After further testing, I'm beginning to have a better idea of the issue. The problem with the above code is the length of the string JCTempTag. As soon as the length of this string goes above 15 characters, it causes the CTD on reload. e.g. JCTempTag = "1234567890123456" crashes, but "123456789012345" doesn't. Perhaps there's some sort of memory overflow or something (I don't know, C/C++ isn't my thing)?

ryobg commented 6 years ago

I haven't had the time to look at this yet. Now, I do not believe that JC will break anytime within such situation. Hence, there is something more involved. Focus should be on having small, reproducable and if possible enclosed within JC test case. If you have something, please share.

ryobg commented 6 years ago

P.S. I haven't found anything notable. Your report is a bit curious as for the Microsoft runtime it means switching to heap allocation during the string assignment, but that on its own can't be the problem.

If possible at this point of your script, try to create again an array and play with it, add, retrieve, remove and etc values. See if it will crash.

Then try to do something else like readFromDirectory or writeToFile with some looong path name. See if it will crash.

Then try to retain again with that long tag.

RealAntithesis commented 6 years ago

I've attached my test script (with .esp). I'll do some more testing when I can. Thanks.

JCTest Script 01.03.2018.zip

RealAntithesis commented 6 years ago

Well, I've made just one change to the AH Hotkeys code - which is to reduce the length of the JCTempTag string from 18 characters down to 11, and now the mod works fine after 10+ save game reloads and testing several functions (including reading and writing JSON files to import and export settings and hotkey set data). I'm pretty positive about this now - out of all 12k+ lines of code (of which a significant portion is JContainers-specific), there was only 1 CTD issue identified to date and all the mod functions appear to be working as they should.

I'll do some more testing over the week to see if any other issues come up, but if not, then I'm leaning towards shortly releasing the SSE edition of AH Hotkeys on the Nexus.

I'm still a little worried about a possible JContainers64 issue with the JValue.retain function. My fix was just a workaround, so the underlying issue is still present. The previous 18 character string worked fine with Skyrim LE and JContainers v3.2.

If you (or someone) could do an independent test with JValue.retain and a tag string >15 characters, that should help corroborate my findings. :) Edit: it's "JValue.retain", not "JArray.retain"...

ryobg commented 6 years ago

I will try to load some time tonight for that. Thanks for your report and example to look at.

ryobg commented 6 years ago

Nah. I gave up. I spent hours testing - no luck. I did scripted tests vs non, also I used the attached script. It can be something fundemental which I miss... In any case I cannot reproduce it.

@RealAntithesis if you have some plugin which will cause a fail (e.g. due to that tag) - let me know. I can debug it on JC side I think, so no need of sources and etc.

RealAntithesis commented 6 years ago

That's strange - maybe it's something peculiar to the plugin, though the JCTest plugin uploaded earlier is the same as the one for AH Hotkeys (just with most of the scripts blanked out). Download attached and here: https://mega.nz/#!qZkx2SRK!GUrIBufEoQwh7UySk9oV_eHgxz2M0regqw_il9RX5L8 The script that causes the CTD is "AH_Hotkeys.pex". I've also included a version of the script that doesn't CTD (with the shorter JCTempTag string). Source included too. Thanks AH Hotkeys - Skyrim Hotkey Manager SSE (DEV v2 CTD Test) 02.03.2018.zip

ryobg commented 6 years ago

Great news! I'm able to reliably reproduce it now. It fails when trying to release objects with tag. Later today I will load the gun and dive again.

ryobg commented 6 years ago

Looks like fixed now. Not sure whether it was not some new quirk in Boost, but there was certain smell, so I used bigger weapon and cleared off the terrain. I uploaded here one release for you to verify.

RealAntithesis commented 6 years ago

Thanks for that. After replacing the DLL, the long JCTempTag string seemed to be fine for a number of save and reloads. Then after testing some other AH Hotkeys functions (something like 5+ reloads later), it crashed on reload. When the save CTDs on reload, the last line in the JC log is "Garbage collection started" after registering functions, "Revert finished" and "Load started". The CTD also happens without the AH Hotkeys plugin, but doesn't happen when JContainers is disabled.

Looks like an issue is still present somewhere. I'll continue testing at my end.

ryobg commented 6 years ago

I will have to be away from my workstation for 2-3 days. In a hurry, I have re-uploaded on the same place a tracing version of JC, you should get enter/exit pairs of log entries for the most of TES API points. If you find out something else, let me know.

RealAntithesis commented 6 years ago

Thanks, I'll see what I can find out. On the plus side (hopefully), I've spent some more time with the shorter temp string and no issues so far.

ryobg commented 6 years ago

Well, if the only diff is the shorter tag, then this is at least some lead. The tracing, though, should help us pinpoint the issue.

RealAntithesis commented 6 years ago

I've attached the JContainers.log for a CTD that happened with a 16 character tag string using the version of JContainers64 you linked last. Not sure if the log helps? JContainers64 (CTD with JCTempTag of 16 chars).log

ryobg commented 6 years ago

Ok. Thank you. I will check it out later. Can you give me some specific steps to reproduce that CDT?

RealAntithesis commented 6 years ago

Seems to be similar to before. I add a few characters to the JCTempTag variable string in the script AH_Hotkeys (to > 15 characters), build, run Skyrim, open a save, quick save (F5), reload (F9), then CTD. Then I revert the change to the script (to < 16 characters), build, run Skyrim, open the same save (not the new one), quick save (F5), reload (F9), then no CTD, reiterate (F9) 20x just to be sure. Regardless though, I might just release AH Hotkeys SSE as a beta, as I haven't come across any other issues to date provided that the JCTempTag string is short enough to not reproduce the CTD. Thanks for your help.

RealAntithesis commented 6 years ago

Belay that. I'll test further before releasing. I just received a report from a Nexus user testing AH Hotkeys for me. Report as follows:

Hi, I'm not really sure, but I'm facing several random crashes - not simple CTDs though. It's more like something seems wrong with something hooked into the game process, the executable. Because I'm getting a weird crash where the game stops somehow, leads me to a black screen while the sound is continuesly playing. In the background appears a windows crash notfication and a CrashDump file is being generated inside the SSE main directoy. It's a access violation, and that didn't happen before I used AHH. But it's also hard to track down exactly because I cannot find anything in the log files. I suspect JContainers to be the reason, because its plugin is the only new instance being hooked into the game. And it's not fully tested yet.

However, the crashes are happening non-regular. Sometimes I can play for a few hours, sometimes it crashes after 30 minutes.

The windows kernel points to an error within C++ processes, and as it's related to SSE, it's whether SKSE64 or something else. But the suspicous thing is that's happening since I'm using AHH.. Maybe JContainers got some code inside that's leading to the program trying to access some part in memory where it doesn't have access to. Maybe this should be forwared to the JContainer devs?

(1170.2db0): Access violation - code c0000005 (first/second chance not available)
ntdll!NtGetContextThread+0x14:
00007fff`13d91bb4 c3              ret

I'll see if I can reproduce this, though Skyrim tends to crash on me every now and then anyway without JContainers or AH Hotkeys.

ryobg commented 6 years ago

I highly doubt that JC can cause crash if not anymore installed? I will try your scenario more seriously when I get back. I hope your plugins will be still available from that download service.

RealAntithesis commented 6 years ago

I haven't come across any further issues (such as random crashes) and the user got back to me saying it was probably another plugin, so I think JContainers is ok (except for the reproducible temp tag issue on retain and release when it goes over 15 characters - but that's ok, I'll just keep the string below 16 characters).

ryobg commented 6 years ago

So to get it right, that fix for over 15 char tags haven't worked actually, as when you play a bit more with AH and reload, it CDT in retain&release (I guess you have seen it in the trace enabled build) function?

RealAntithesis commented 6 years ago

Sorry, I've been preoccupied getting the release version of AH Hotkeys out on the Nexus. This is now released, noting that JContainers64 is in Beta and SKSE in Alpha. I haven't come across any further CTDs in the weeks I've been testing.

Re: the trace enabled version - does the jcontainers log show the last executed function that was run?

ryobg commented 6 years ago

Yes, most of the Papyrus API funcs which were:

P.S. I haven't looked also futher as I was away and busy with other stuff. Now, for sure I want to fix these tags and anything related before release.

ryobg commented 6 years ago

Hi again @RealAntithesis , I think I may have found another issue with tags (and domain names). Not sure when this has appeared, but apparently binary archives with tags/strings longer than 15 bytes, I believe are screwed (in Debug mode, which is not available generally, they were screwed always). Can't say how relevant is this to your case though. In the linked above JC archive I have added a fix. Regretfully, breaking compatibility with binary archives made with previous versions.

ryobg commented 6 years ago

Well, I accidentally closed this ticket, but maybe it is not so bad idea? Anyway, feel free to reopen anytime.