rollingrock / EngineFixesVR

Port of SSE Engine Fixes for Skyrim VR
MIT License
67 stars 9 forks source link

Could you look into the code that unloads cells? #7

Closed CritLoren closed 3 years ago

CritLoren commented 3 years ago

Hello,

First I should say this is some wonderful work, and it's making VR play much better than without. I'm currently playing VR with the UVRE Wabbajack list and it comes with EngineFixesVR built in, however we've noticed there's an issue with the game, specifically with the bPreemptivelyUnloadCells=1 setting. Having it enabled lets you fast travel with minimal crashes, however some dungeons and caves will crash when going into them (from my testing, going from Valthume to Valthume Catacombs has you crash on the loading screen). Having it disabed/removed fixes the issue of the loading screen crashes at specific cell transitions and I was able to progress through Valthume without issues, but now fast travel crashes the game consistently.

I was wondering if there's anything you can take a look at there to possibly fix the issue? Thanks.

rollingrock commented 3 years ago

Hi. Glad you are enjoying it!

So I"ve seen reports about this kind of cell issue before and usually it was due to conflicting DLL mods. Can you let me know what your sksevr.log says?

CritLoren commented 3 years ago

sksevr.log Sure. This is my last sksevr.log, I don't remember if this is from a crash or from me quitting the game, but I think it's the former.

One more thing I should mention is this is a setting that helps memory load and, according to some wiki info, is recommended for people with low ram, however I have 32GB, so it sounds like something is going wrong with the memory management if I need it to not crash while fast traveling.

rollingrock commented 3 years ago

ok thanks for that. I don't see any dll's you're loading that should conflict so that's good.

Have you ever used a debugger like windbg? That's always the fastest way to narrow down what is crashing.

CritLoren commented 3 years ago

No but i think i got the hang of it.

Here's the analyzed result of crashing while going valthume => valthume catacombs (with the unload cell line in)

*******************************************************************************
*                                                                             *
*                        Exception Analysis                                   *
*                                                                             *
*******************************************************************************

DEBUG_FLR_EXCEPTION_CODE(c0000374) and the ".exr -1" ExceptionCode(c0000005) don't match

KEY_VALUES_STRING: 1

    Key  : AV.Fault
    Value: Read

    Key  : Analysis.CPU.Sec
    Value: 1

    Key  : Analysis.DebugAnalysisProvider.CPP
    Value: Create: 8007007e on RAZER-LAPTOP

    Key  : Analysis.DebugData
    Value: CreateObject

    Key  : Analysis.DebugModel
    Value: CreateObject

    Key  : Analysis.Elapsed.Sec
    Value: 1

    Key  : Analysis.Memory.CommitPeak.Mb
    Value: 256

    Key  : Analysis.System
    Value: CreateObject

    Key  : Timeline.Process.Start.DeltaSec
    Value: 70

CONTEXT:  (.ecxr)
rax=0000000000000000 rbx=0000020914230000 rcx=0000020914230000
rdx=0000020900000000 rsi=0000020914230000 rdi=0000000000000000
rip=00007ffe7d689606 rsp=00000091cd0ff780 rbp=0000000000000000
 r8=0000000000000000  r9=0000000000000000 r10=000000000000000d
r11=00007ff61fd16f88 r12=0000000000000000 r13=0000020900000000
r14=00000208fffffff0 r15=00000209cce4cfd8
iopl=0         nv up ei pl zr na po nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010246
ntdll!RtlpFreeHeapInternal+0x4a6:
00007ffe`7d689606 41807e0f05      cmp     byte ptr [r14+0Fh],5 ds:00000208`ffffffff=??
Resetting default scope

EXCEPTION_RECORD:  (.exr -1)
ExceptionAddress: 00007ffe7d689606 (ntdll!RtlpFreeHeapInternal+0x00000000000004a6)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 0000000000000000
   Parameter[1]: 00000208ffffffff
Attempt to read from address 00000208ffffffff

PROCESS_NAME:  SkyrimVR.exe

READ_ADDRESS:  00000208ffffffff 

ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%p referenced memory at 0x%p. The memory could not be %s.

EXCEPTION_CODE_STR:  c0000005

EXCEPTION_PARAMETER1:  0000000000000000

EXCEPTION_PARAMETER2:  00000208ffffffff

ADDITIONAL_DEBUG_TEXT:  Enable Pageheap/AutoVerifer ; Followup set based on attribute [Is_ChosenCrashFollowupThread] from Frame:[0] on thread:[PSEUDO_THREAD]

FAULTING_THREAD:  000010a0

STACK_TEXT:  
00000000`00000000 00000000`00000000 heap_corruption!SkyrimVR.exe+0x0

SYMBOL_NAME:  heap_corruption!SkyrimVR.exe

MODULE_NAME: heap_corruption

IMAGE_NAME:  heap_corruption

STACK_COMMAND:  ** Pseudo Context ** ManagedPseudo ** Value: 2c67a836cd0 ** ; kb

FAILURE_BUCKET_ID:  HEAP_CORRUPTION_c0000005_heap_corruption!SkyrimVR.exe

OS_VERSION:  10.0.19041.1

BUILDLAB_STR:  vb_release

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

FAILURE_ID_HASH:  {138b3785-0a9c-803a-6a33-820067fcb02d}

Followup:     MachineOwner
---------

And here's an analysis of a crash with the line commented out but for some reason while trying to go outside of valthume (I was planning to get a crash dump of the fast travel crashes, but for some reason I only crashed on the 4th-5th fast travel, and it didn't leave a dump, so that's inconclusive):

*******************************************************************************
*                                                                             *
*                        Exception Analysis                                   *
*                                                                             *
*******************************************************************************

*** WARNING: Unable to verify checksum for SkyrimVR.exe
*** WARNING: Unable to verify checksum for dragonborn_speaks_naturally.dll
*** WARNING: Unable to verify timestamp for JContainersVR.dll
*** WARNING: Unable to verify timestamp for AutoSneakVR.dll
*** WARNING: Unable to verify timestamp for Skyrim Refocused.dll
*** WARNING: Unable to verify timestamp for StormLightningVR.dll
*** WARNING: Unable to verify timestamp for VRFpsStabilizer.dll
DEBUG_FLR_EXCEPTION_CODE(c0000374) and the ".exr -1" ExceptionCode(c0000005) don't match

KEY_VALUES_STRING: 1

    Key  : AV.Fault
    Value: Read

    Key  : Analysis.CPU.Sec
    Value: 1

    Key  : Analysis.DebugAnalysisProvider.CPP
    Value: Create: 8007007e on RAZER-LAPTOP

    Key  : Analysis.DebugData
    Value: CreateObject

    Key  : Analysis.DebugModel
    Value: CreateObject

    Key  : Analysis.Elapsed.Sec
    Value: 9

    Key  : Analysis.Memory.CommitPeak.Mb
    Value: 250

    Key  : Analysis.System
    Value: CreateObject

    Key  : Timeline.Process.Start.DeltaSec
    Value: 115

CONTEXT:  (.ecxr)
rax=0000000000000000 rbx=00000233d61d0000 rcx=00000233d61d0000
rdx=0000023300000000 rsi=00000233d61d0000 rdi=0000000000000000
rip=00007ffe7d689606 rsp=000000420953f430 rbp=0000000000000000
 r8=0000000000000000  r9=0000000000000000 r10=000000000000000d
r11=00007ff61fd16f88 r12=0000000000000000 r13=0000023300000000
r14=00000232fffffff0 r15=0000023390911e78
iopl=0         nv up ei pl zr na po nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010246
ntdll!RtlpFreeHeapInternal+0x4a6:
00007ffe`7d689606 41807e0f05      cmp     byte ptr [r14+0Fh],5 ds:00000232`ffffffff=??
Resetting default scope

EXCEPTION_RECORD:  (.exr -1)
ExceptionAddress: 00007ffe7d689606 (ntdll!RtlpFreeHeapInternal+0x00000000000004a6)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 0000000000000000
   Parameter[1]: 00000232ffffffff
Attempt to read from address 00000232ffffffff

PROCESS_NAME:  SkyrimVR.exe

READ_ADDRESS:  00000232ffffffff 

ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%p referenced memory at 0x%p. The memory could not be %s.

EXCEPTION_CODE_STR:  c0000005

EXCEPTION_PARAMETER1:  0000000000000000

EXCEPTION_PARAMETER2:  00000232ffffffff

ADDITIONAL_DEBUG_TEXT:  Enable Pageheap/AutoVerifer ; Followup set based on attribute [Is_ChosenCrashFollowupThread] from Frame:[0] on thread:[PSEUDO_THREAD]

FAULTING_THREAD:  00000c58

STACK_TEXT:  
00000000`00000000 00000000`00000000 heap_corruption!SkyrimVR.exe+0x0

SYMBOL_NAME:  heap_corruption!SkyrimVR.exe

MODULE_NAME: heap_corruption

IMAGE_NAME:  heap_corruption

STACK_COMMAND:  ** Pseudo Context ** ManagedPseudo ** Value: 16c9480a2c0 ** ; kb

FAILURE_BUCKET_ID:  HEAP_CORRUPTION_c0000005_heap_corruption!SkyrimVR.exe

OS_VERSION:  10.0.19041.1

BUILDLAB_STR:  vb_release

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

FAILURE_ID_HASH:  {138b3785-0a9c-803a-6a33-820067fcb02d}

Followup:     MachineOwner
---------

The SKSE.ini file that Timboman supplies has the following memory settings:

[Memory]
defaultHeapInitialAllocMB=2048
scrapHeapSizeMB=1024

Might be relevant.

rollingrock commented 3 years ago

thanks. let me look at this

CritLoren commented 3 years ago

Hey, not sure if you've had any success, but after crashing... quite a couple more times, I compared 3 new and different instances where I crashed due to this same memory issue and there's a few things that are the same in the new crash logs that might help you pinpoint the issue and hopefully fix it. These look a bit different but I'd imagine it's because I also enabled diagnostics or something. The issue is though, they don't have the same symbol as the original ones i posted, so this might just be a different memory issue, especially since these new crashes have as context

0007ff6`e9a06db9 f6414001        test    byte ptr [rcx+40h],1 ds:00000000`00000040=??

whereas the old one had

00007ff6`828172c2 ff9078010000    call    qword ptr [rax+178h] ds:00000000`00000179=????????????????

The stack text is mostly the same, except for the pointers at the end, which are SkyrimVR+0x276db9, SkyrimVR+0x2f4d64 and SkyrimVR+0x4556aa with the symbol name being SkyrimVR+276db9. I've exported a couple of them to txt, you can run them in a text compare tool or smth if you want. They're at the end of this comment. Also I've generated this comparison htm to compare 3 different crashes, maybe there's a correlation between them. Although I don't think there's much of a similarity except these two lines and the fact that they're all access violations

iopl=0         nv up ei pl nz na po nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b 

dmp compare.zip

2021-01-26_23.58.56.TXT 2021-01-27_00.02.34.TXT 2021-01-27_00.05.48.TXT 2021-01-27_00.00.18.TXT 2021-01-25_17.29.47.TXT 2021-01-21_16.11.17.TXT 2021-01-20_18.42.03.TXT 2021-01-26_23.57.38.TXT 2021-01-27_00.04.16.TXT

rollingrock commented 3 years ago

Thanks man. Let me look through these and see if I can figure out what's happening.

The big problem is from your stack the issue isn't in my dll so need to figure out what your stack is referencing.

CritLoren commented 3 years ago

Let me know if there's anything else I can do to help

CritLoren commented 3 years ago

Some of my previous crashes (the later ones) might be caused by an issue on my side, which is now fixed.

Nevertheless, the original bug is still an issue. If you've got the time, set up UVRE on your side and head over to sunderstone gorge. I've got myself and another player crashing upon leaving the cell (even by coc, though the other person could go into some smaller cells, but they still crash while trying to leave that cell) once the dungeon is completed.

From their testing, it's not the word wall that the issue (we were playing with the idea of that being a possibility). It's either something about the enemies or the cell being marked as cleared that screws up the cell unloading process.

rollingrock commented 3 years ago

Cool thanks for giving me a defined test case. Let me try that out and see if I can locate the crash.

Nezacant commented 3 years ago

Hi there, just wanted to share my findings as well. Sunderstone Gorge Crash.txt The crash prevents you from leaving the dungeon regardless of the bPreemptivelyUnloadCells setting. As a work around, I teleported to Breezehome using the console. Was able to leave Whiterun after toggling the bPreemptivelyUnloadCells=1 to 0.

Thanks for all of your work. I hope this helps!

rollingrock commented 3 years ago

Thank you Nezacant. That will help!

rollingrock commented 3 years ago

So I've tried to create a crash but have been unable too.

I installed a completely fresh UVRE and started new game->camping in the woods. Traveling around and using coc to go everywhere including sunderstone gorge I have not seen any crashes.

If there's any other details you guys have that I can use to reproduce including any other mods you have besides UVRE please let me know

CritLoren commented 3 years ago

Going in and out fast won't crash it, try going through it and completing it.

rollingrock commented 3 years ago

Alright I got a crash finally!!

Let me see if I can debug it now

rollingrock commented 3 years ago

So i've looked at the crash a few times now and every time i see HDT-SMP in the stack. Disabling HDT-SMP I didn't crash going through it twice so I'm leaning towards there being something wrong with that mod and not necessarily mine.

Let me look through the source of HDT and see if it's doing anything that would conflict with my mod.

CritLoren commented 3 years ago

i already have hdt removed and going, for example, from ustengrav to ustengrav depths still crashes my game unless i set the unload cells line to 0 (also i'd keep it at 0 if that didn't increase my fast travel crash rate by a lot). i believe there are two cell issues which are perhaps related. one is going from a cell to a sub cell (like in ustengrav, valthume, etc, although i've also encountered this going from the skyrim worldspace to a dungeon near markarth) which is fixed by toggling that setting, the other being going from a cell to the skyrim worldspace. i still get some of those, which i have to retry after restarting the game and essentially bruteforce through. the former crashes no matter how fast you go through i think, the latter requires spending more time/completing the area.

it would be helpful if skse wouldn't miss writing a dump during so many crashes...

rollingrock commented 3 years ago

Yeah I think you are right about there being two issues haha

There is defintely an issue with this dungeon and HDT. FOr example without running engine fixes at all and running through sunderstone I get the stack at crash on exiting.

[0x0] SkyrimVR + 0xdfa54c
[0x1] SkyrimVR + 0xdf8a79
[0x2] SkyrimVR + 0xe07e4a
[0x3] SkyrimVR + 0xe3b738
[0x4] SkyrimVR + 0xe3b388
[0x5] SkyrimVR + 0xca7091
[0x6] hdtSMP64!SKSEPlugin_Load + 0xdb94
[0x7] hdtSMP64!SKSEPlugin_Load + 0xdbcd
[0x8] hdtSMP64!SKSEPlugin_Load + 0xdbcd
[0x9] hdtSMP64 + 0x2d822
[0xa] hdtSMP64 + 0x4586d
[0xb] SkyrimVR + 0xf1f0e5
[0xc] SkyrimVR + 0xf1abad
[0xd] SkyrimVR + 0x168591
[0xe] SkyrimVR + 0x6bd2a8
[0xf] SkyrimVR + 0x5bab7d
[0x10] hdtSMP64 + 0x453e8
[0x11] SkyrimVR + 0x5b6e94
[0x12] SkyrimVR + 0x5b42c5
[0x13] SkyrimVR + 0x138af0a
[0x14] KERNEL32!BaseThreadInitThunk + 0x14
[0x15] ntdll!RtlUserThreadStart + 0x21

Those functions at the top are in the bhkWorld class. Not sure what's going on there.

However I did get a crash when i removed HDT as well. Nothing in my code but maybe it's something I am indirectly touching. Need to investigate more.

CritLoren commented 3 years ago

Well I'm glad someone who knows what they're doing is actually taking a look at this lmao These crashes are the last issue I have with skyrim vr Tbh

rollingrock commented 3 years ago

haha. I'll see what I can do. These types of crashes are a pain in the ass

Nezacant commented 3 years ago

Really appreciate you looking into this!

On Sun, Feb 7, 2021, 5:03 PM Madalin Vlad notifications@github.com wrote:

Well I'm glad someone who knows what they're doing is actually taking a look at this lmao These crashes are the last issue I have with skyrim vr Tbh

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rollingrock/EngineFixesVR/issues/7#issuecomment-774775984, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABOMKYUB2526FV53FTLCFQTS54EZZANCNFSM4WJFTF4Q .

rollingrock commented 3 years ago

So i'm definitely sure it doens't have anything to do with Engine Fixes. At least the consistent crash I get. Where the crash is located in the code where it's drawing the new scene and can't free up some memory. I've been looking at it but I don't have a good solution to it yet. I need to figure out a way to break before it crashes so I can see what ultimately it is trying to do but that's difficult to do.

I'll keep looking at it though.

CritLoren commented 3 years ago

Yeah I was hoping it could be something with the skyrim engine that could perhaps be added to engine fixes 😅

rollingrock commented 3 years ago

potentially. I was thinking if nothing else i could just try to detect the bad register and stop the code from executing into a crash. not sure what else that might break lol

Better to figure out why it's passing in a bad memory address to begin with.

rollingrock commented 3 years ago

hey guys i just made a new release. It shouldn't have anything to do with the crash we've been talking about here but will be more stable in any event.

rollingrock commented 3 years ago

So little update on this.

I think this is just one of the myriad of memory issues that plague this game. I've been working on implementing Ryan's Memory Manager fix which specifically replaces all the Bethesda memory allocators with either system or tbb allocators.

This works but it is bringing to the surface what I think are actual bugs in the code which maybe be what's showing up with crashes like what is talked about here. I don't really know how many of these there will be but I'm going to attempt to start to fix them in Engine Fixes as well so I'm hoping the next time I have something working it will also fix this crash. Time will tell....

rollingrock commented 3 years ago

I just made a new release. see if it helps!