ryobg / JContainers

JSON-based data structures for Papyrus - the TESV Skyrim SE scripting language
MIT License
107 stars 23 forks source link

Save game corruption #5

Closed pugwash99 closed 6 years ago

pugwash99 commented 6 years ago

A debug build doesn't run correctly but the mods I'm testing appear to work with a release build. The mod I'm testing with appears fully functional.

The problem appears when attempting to load a save game, either the load function or the save is corrupting the data and causing another ctd.

ryobg commented 6 years ago

Can you elaborate a bit more on "Debug build does not run correctly"? I havent look soon in the Debug configuration, but we need it running so that we can debug cases like this one :)

pugwash99 commented 6 years ago

I think the debug issue was a complete mistake on my part, I was running with a debug session and for some reason it wasn't registering some of the data coming from a json table being read from file. Since I recompiled and ran without a debugger attached it's working correctly.

However, with a jcontainers debug build installed an old save without JC data in the skse save file will load correctly. Save the game and then reload I get an exception right at the moment where it should load data. The message in a pop-up window is

file: minkernal/crts\ucrt\src\appcrt\heap\debug_heap.cpp Line 888

Expression: _CrtlsValidHeapPointer(block)

Checking the skse counterpart file the json data is present.

With a VS debugging session attached to skyrim it results in a ctd without the pop-up. No break points work on the save or load functions in the skse_callbacks.cpp, so either the save is corrupting the data or the load is not working or reading past where it should.

The crash appears to happen in _domain_master::master::instance().read_fromstream(stream); With no way to use a debugger I've been adding JC_log calls to the code so I can trace it's and one after that line doesn't get called.

ryobg commented 6 years ago

Did you narrowed it down? Tracing is enough to pinpoint such an issue.

pugwash99 commented 6 years ago

A little bit, in routine 'auto read_from_stream(master& self, std::istream& stream) -> void {' in domain_master.cpp

I've a JC log before and after the line 'archive >> self;' the second doesn't appear in the log file, It's around line 200

I found post at that might be a similar problem with boost serialization and deserialization.

https://stackoverflow.com/questions/33317403/exception-boostarchivearchive-exception-at-memory-location

ryobg commented 6 years ago

Hmm not sure... Is there any way for me to reproduce that easily? I can only speculate otherwise. Maybe one of the numerous load(Archive&, unsigned) functions is failing down there.

pugwash99 commented 6 years ago

So if you install jc is SSE loading a save made with it. Could it just be my build?

As the original code was working I can't imagine it's something in the boost code. I did find one possible reason in another post.

If you serialize an object with a header that uses a 32bit variable but deserialize to a 64bit or vice versa it will also cause an exception.

It could equally be that I've got old data hanging around in a skse and it's causing the problem as . I'll try deleting the skse file and see what happens. A boost 32bit iarchive stream is incompatible with a boost 64bit iarchive stream. But the clean save works and save with RP data isn't. I'm looking through the code to see if I can find a 32bit / 64bit conversion error.

Found the post I was looking for.

http://boost.2283326.n4.nabble.com/quot-invalid-signature-quot-in-boost-archive-binary-iarchive-td4630498.html

Comparing a 32bit skse save file and a skse 64bit save file the JSTR some the first RSTJ to 'archive' there's a 4 byte difference which is leading me to believe a we've got a 32bit variable being written as a 64bit but being read as 32bit.

Deleted the skse save file and let it be recreated - reload - crash with the same error. It got a little further as it crashed at u_delete_inactive_domains(self); but that could be because the self object is invalid. This time the expression reported is is_block_type_valid(header->_block_use)

pugwash99 commented 6 years ago

well that's a bit annoying, I replicated the full folder tree from the original jcontainers which includes a dummy file in the domains folder. Delete the file and the crashes stop happening. It must be related to the newer domains code that was included in v3.3 because without that file I can save and load.

It might also explain why you may not be having the problem, if you only added the domains folder and not the dummy file as well.

ryobg commented 6 years ago

Can you check out whether this happen with the new distro? In the new one I have added one .force-install dummy file. It should be ignored during load of domains.

I was chasing some issues with the backward compatibility of loading archives. Certainly not. Boost does uses native sizes and the offsets get messed up everywhere.

pugwash99 commented 6 years ago

Boost 32bit and boost 64bit have a known incompatibility with binary archives, text files are find but binary definitely appear have a 4 byte increase in size where a 32bit varible is being replaced by a 64bit one. It was one of the things I was looking into.

I compared the structure of a fresh 32bit skse file and a 64bit one by locating the jcontainers headers and there's a definite 4 byte increase. And there are plenty of posts on various forums confirming the problem. I know that size_t changes size between 32bit and 64bit code.

I'll download it in a while and test it but without the dummy file I can definitely save and load without a problem, that I can see but only 1 or 2 saves isn't a real stress test. Add the file back and the skse save file must reference it in some way causing a ctd in the deserializing processes.

ryobg commented 6 years ago

Let me know the importance of supporting reads of old archives. As I said, the issue is built inside Boost. Small example: when you need to read a string back, Boost reads now 8 bytes for its size instead of 4. As you can imagine, it tries then to allocate some huge storage and you get an exception if you are lucky. I can imagine possible solutions but these will be PITA.

I want to know whether the issue with that dummy file persist. I will check again too of course.

pugwash99 commented 6 years ago

Personally it's not important, True it looses some data but it might be better to start clean as when other mods start to load data it could cause more problems. One way to support it would be to identify the additional four bytes in the skse chunk, insert them in memory and then change the two size records accordingly to fix the discrepancy. But you'd have to know that the chunk is from an old file and I'm not sure that's easy to do if it's even possible.

I'm sure skse64 isn't 100% compatible either and it's still in alpha, the fact so many scripted mods work with it with few if any changes is a bonus at this point.

I've rebuilt with the fresh release, I loaded and saved a few times, moved location, saved again, closed skyrim, reopened and then reloaded no ctd. There's no mention of the .forced install file that's in the domains folder and loading a save made with the version I was working with didn't crash but it did freeze for a few seconds.

ryobg commented 6 years ago

Is the "freeze" reproducable? How and when it happens, what happens? If I think about it - we do not know whether it wouldn't freeze before too (as it was crashing).

pugwash99 commented 6 years ago

It was when I was loading a save, since then it's returned to near normal it might be the data from my old skse being loaded in the new code. It could also be because I'm running skyrim in a window rather than full screen so that I could monitor the jcontainers log, it freezes rather than go to menu when it doesn't have focus.

ryobg commented 6 years ago

Ok. Glad that everything sounds ok. Let me know if this load-related issue persist (feel free to reopen).