jp7677 / dxvk-nvapi

Alternative NVAPI implementation on top of DXVK.
MIT License
373 stars 32 forks source link

Cyberpunk 2077 doesn't start with NVAPI: crashes in nvngx #144

Closed piquan closed 10 months ago

piquan commented 1 year ago

I seem to be one of only a few people with this problem, and could use some help troubleshooting it. I’m sorry to file this as an issue; if there’s a support forum / Discord channel / whatever, please direct me that way.

The crash happens in _nvngx.dll, but I’ve had a hard time working out more details, or even what parts of Proton are relevant. Is this likely to be relevant to dxvk-nvapi?

Running with %command% --launcher-skip:

The game doesn’t launch. I get crash data (in ~/.steam/steam/steamapps/compatdata/1091500/pfx/drive_c/users/steamuser/AppData/Local/REDEngine/ReportQueue/) in the form of a minidump file, which I can’t read.

Running with PROTON_ENABLE_NVAPI=0 %command% --launcher-skip:

Game launches and runs normally. DLSS is not available in the settings menu.

Running with PROTON_LOG=1 DXVK_NVAPI_LOG_LEVEL=info DXVK_NVAPI_LOG_PATH=/tmp %command% --launcher-skip:

Similar to above. Examining the logs, I note that the fatal exception is in _nvngx.dll (this address is relevant later):

181127.904:0144:0148:trace:loaddll:build_module Loaded L"C:\\Windows\\System32\\_nvngx.dll" at 000000000FB90000: native

Here’s the logs around the point of the exception:

181128.241:0144:0148:trace:loaddll:build_module Loaded L"S:\\common\\Cyberpunk 2077\\bin\\x64\\sl.dlss_d.dll" at 000000000B440000: native
NvAPI_EnumPhysicalGPUs: OK
NvAPI_GPU_GetAdapterIdFromPhysicalGpu: OK
NvAPI_DRS_GetSetting (0x10e41df2/Unknown): Setting not found
NvAPI_EnumPhysicalGPUs: OK
NvAPI_DRS_GetSetting (0x10e41df2/Unknown): Setting not found
NvAPI_GPU_GetArchInfo: OK
NvAPI_SYS_GetDriverAndBranchVersion: OK
NvAPI_DRS_GetSetting (0x10afb764/Unknown): Setting not found
NvAPI_DRS_CreateSession: OK
NvAPI_DRS_FindApplicationByName (S:\common\Cyberpunk 2077\bin\x64\Cyberpunk2077.exe): Executable not found
NvAPI_DRS_DestroySession: OK
181128.244:0144:0148:fixme:cryptasn:CryptDecodeObjectEx Unsupported decoder for lpszStructType 1.3.6.1.4.1.311.2.1.4
181128.244:0144:0148:fixme:cryptasn:CryptDecodeObjectEx Unsupported decoder for lpszStructType 1.3.6.1.4.1.311.2.1.4
181128.428:0144:0148:fixme:cryptasn:CryptDecodeObjectEx Unsupported decoder for lpszStructType 1.3.6.1.4.1.311.2.1.4
181128.429:0144:0148:fixme:cryptasn:CryptDecodeObjectEx Unsupported decoder for lpszStructType 1.3.6.1.4.1.311.2.1.4
181128.522:0144:0148:trace:loaddll:build_module Loaded L"C:\\windows\\system32\\nvcuda.dll" at 0000000236DF0000: builtin
181128.523:0144:0148:trace:loaddll:build_module Loaded L"C:\\windows\\system32\\vulkan-1.dll" at 00000003AA5B0000: builtin
181128.523:0144:0148:trace:loaddll:build_module Loaded L"S:\\common\\Cyberpunk 2077\\bin\\x64\\nvngx_dlssg.dll" at 0000000039830000: native
181128.526:0144:0148:trace:loaddll:free_modref Unloaded module L"S:\\common\\Cyberpunk 2077\\bin\\x64\\nvngx_dlssg.dll" : native
181128.527:0144:0148:trace:loaddll:free_modref Unloaded module L"C:\\windows\\system32\\vulkan-1.dll" : builtin
181128.528:0144:0148:trace:loaddll:free_modref Unloaded module L"C:\\windows\\system32\\nvcuda.dll" : builtin
181128.528:0144:0148:warn:seh:dispatch_exception backtrace: --- Exception 0xc0000005.
181128.528:0144:0148:trace:seh:dispatch_exception code=c0000005 flags=0 addr=000000000FBBD28C ip=fbbd28c
181128.528:0144:0148:trace:seh:dispatch_exception  info[0]=0000000000000000
181128.528:0144:0148:trace:seh:dispatch_exception  info[1]=ffffffffffffffff
181128.528:0144:0148:warn:seh:dispatch_exception EXCEPTION_ACCESS_VIOLATION exception (code=c0000005) raised
181128.528:0144:0148:trace:seh:dispatch_exception  rax=000000000021dfe8 rbx=0000000000000000 rcx=000000000911d590 rdx=40942e3da8b865dd
181128.528:0144:0148:trace:seh:dispatch_exception  rsi=000000000000000d rdi=000000000021e088 rbp=000000000021e580 rsp=000000000021dfc0
181128.528:0144:0148:trace:seh:dispatch_exception   r8=0000000000000014  r9=00000000003500c0 r10=000000000fcd0ac8 r11=000000000911d5ca
181128.528:0144:0148:trace:seh:dispatch_exception  r12=0000000000000000 r13=000000000021e8c0 r14=0000000003492690 r15=0000000000000001

I’m not really sure how I can make any progress debugging this. Do you have any suggestions?

dxvk-nvapi.log steam-1091500.log

jp7677 commented 1 year ago

May be you got hit by this issue https://forums.developer.nvidia.com/t/bug-stable-535-driver-causes-access-violation-inside-nvngx-dll-in-diablo-iv/256828 Do you have PROTON_ENABLE_NGX_UPDATER set and if you have, can you try again without?

piquan commented 1 year ago

That's a great theory! I do experiment with CUDA and other Nvidia-proprietary features, so it would be reasonable for me to have a bit of weird configuration left over somewhere from an old experiment (although I've already done my best to find and eliminated them all).

However, I checked, and it's not defined. I also made sure that nvidia-ngx-conf.json is nowhere on my system (per the driver docs), and also tried explicitly setting PROTON_ENABLE_NGX_UPDATER=0. I didn't see any change in any of those from the baseline, either with or without logging.

jp7677 commented 1 year ago

It would have been too easy otherwise ;) I'm not a hero at reading SEH traces, how exactly did you figured it crashes inside _nvngx.dll (that said it sure does look like things go wrong when Streamline does some initialization for DLSS)?

jp7677 commented 1 year ago

~PS: I guess the crash is gone when you set WINEDLLOVERRIDES=nvngx=? (Obviously this will prevent DLSS from working, so this is just a test to exclude things)~

jp7677 commented 1 year ago

Since I have a similar setup and also own the game, I produced the logs from a successful start. This is the interesting part which is somewhat equivalent with your logs except, well, how it ends:

327.038:0138:013c:trace:loaddll:build_module Loaded L"C:\\Windows\\System32\\_nvngx.dll" at 000000000FD80000: native
NvAPI_Initialize: OK
NvAPI_EnumPhysicalGPUs: OK
NvAPI_GPU_GetAdapterIdFromPhysicalGpu: OK
NvAPI_DRS_CreateSession: OK
NvAPI_DRS_LoadSettings: OK
NvAPI_DRS_GetBaseProfile: OK
NvAPI_DRS_GetSetting (0x10e41df2/Unknown): Setting not found
NvAPI_QueryInterface (0xf2400ab): Unknown function ID
NvAPI_EnumPhysicalGPUs: OK
NvAPI_GPU_GetAdapterIdFromPhysicalGpu: OK
NvAPI_DRS_GetSetting (0x10e41df2/Unknown): Setting not found
NvAPI_GPU_GetArchInfo: OK
NvAPI_SYS_GetDriverAndBranchVersion: OK
NvAPI_DRS_CreateSession: OK
NvAPI_QueryInterface (0xa782ea46): Unknown function ID
NvAPI_DRS_FindApplicationByName (S:\common\Cyberpunk 2077\bin\x64\Cyberpunk2077.exe): Executable not found
NvAPI_DRS_DestroySession: OK
...
327.195:0138:013c:trace:loaddll:build_module Loaded L"S:\\common\\Cyberpunk 2077\\bin\\x64\\nvngx_dlss.dll" at 0000000037210000: native
NvAPI_DRS_GetSetting (0x10afb764/Unknown): Setting not found
...
327.253:0138:013c:trace:loaddll:build_module Loaded L"C:\\windows\\system32\\nvcuda.dll" at 0000000236DF0000: builtin
327.253:0138:013c:trace:loaddll:build_module Loaded L"S:\\common\\Cyberpunk 2077\\bin\\x64\\nvngx_dlssg.dll" at 0000000039530000: native
...
327.543:0138:013c:trace:loaddll:build_module Loaded L"S:\\common\\Cyberpunk 2077\\bin\\x64\\nvngx_dlssd.dll" at 000000003A1D0000: native
327.544:0138:013c:trace:loaddll:free_modref Unloaded module L"S:\\common\\Cyberpunk 2077\\bin\\x64\\nvngx_dlssd.dll" : native
NvAPI_Initialize: OK
NvAPI_EnumPhysicalGPUs: OK
NvAPI_GPU_GetAdapterIdFromPhysicalGpu: OK
NvAPI_GPU_GetArchInfo: OK
327.547:0138:013c:warn:debugstr:OutputDebugStringA "[01.10.2023 07-27-22][streamline][info]commonentry.cpp:533[getFeatureRequirements] NGX feature 1 requirements - minOS 10.0.0 minHW 0x160\n"
...
327.554:0138:013c:trace:loaddll:build_module Loaded L"S:\\common\\Cyberpunk 2077\\bin\\x64\\sl.dlss_d.dll" at 000000000B230000: native
NvAPI_EnumPhysicalGPUs: OK
NvAPI_GPU_GetAdapterIdFromPhysicalGpu: OK
NvAPI_DRS_GetSetting (0x10e41df2/Unknown): Setting not found
NvAPI_EnumPhysicalGPUs: OK
NvAPI_GPU_GetAdapterIdFromPhysicalGpu: OK
NvAPI_DRS_GetSetting (0x10e41df2/Unknown): Setting not found
NvAPI_GPU_GetArchInfo: OK
NvAPI_SYS_GetDriverAndBranchVersion: OK
NvAPI_DRS_CreateSession: OK
NvAPI_DRS_FindApplicationByName (S:\common\Cyberpunk 2077\bin\x64\Cyberpunk2077.exe): Executable not found
NvAPI_DRS_DestroySession: OK
327.555:0138:013c:fixme:cryptasn:CryptDecodeObjectEx Unsupported decoder for lpszStructType 1.3.6.1.4.1.311.2.1.4
327.555:0138:013c:fixme:cryptasn:CryptDecodeObjectEx Unsupported decoder for lpszStructType 1.3.6.1.4.1.311.2.1.4
327.677:0138:013c:fixme:cryptasn:CryptDecodeObjectEx Unsupported decoder for lpszStructType 1.3.6.1.4.1.311.2.1.4
327.677:0138:013c:fixme:cryptasn:CryptDecodeObjectEx Unsupported decoder for lpszStructType 1.3.6.1.4.1.311.2.1.4
327.723:0138:013c:fixme:cryptasn:CryptDecodeObjectEx Unsupported decoder for lpszStructType 1.3.6.1.4.1.311.2.1.4
327.723:0138:013c:fixme:cryptasn:CryptDecodeObjectEx Unsupported decoder for lpszStructType 1.3.6.1.4.1.311.2.1.4
327.967:0138:013c:trace:loaddll:build_module Loaded L"S:\\common\\Cyberpunk 2077\\bin\\x64\\nvngx_dlssd.dll" at 000000003A4D0000: native
327.968:0138:013c:trace:loaddll:free_modref Unloaded module L"S:\\common\\Cyberpunk 2077\\bin\\x64\\nvngx_dlssd.dll" : native
327.970:0138:013c:warn:debugstr:OutputDebugStringA "[01.10.2023 07-27-23][streamline][warn]commonentry.cpp:491[getFeatureRequirements] NVSDK_NGX_D3D12_GetFeatureRequirements Not supported on this driver - may not enforce all feature requirements; update NVIDIA driver for optimal behavior\n"

The log line with NVSDK_NGX_D3D12_GetFeatureRequirements is not visible in your log file. This indicates a result from _nvngx.dll and supports your theory that it goes wrong somewhere close to this.

An interesting note from your logs, loading nvcuda.dll is directly followed by loading vulkan-1.dll. Just to be sure, do you use vanilla Proton Experimental or do you use an alternative nvcuda implementation?

If you want to dive deeper, NGX has registry settings to enable logging (https://github.com/NVIDIA/DLSS/tree/main/utils), may be they reveal something useful. Be careful with your Proton/Wine prefix though.

jp7677 commented 11 months ago

@piquan Did you succeeded in solving this issue?

piquan commented 11 months ago

I'm so sorry I haven't replied! I had messed up my GitHub settings and didn't see your responses!

I said that it crashes within _nvngx.dll because it listed the address of the crash as 0FBBD28C, and that's where it previously loaded _nvngx.dll (starting at 0FB90000, and the next DLL loaded was after the crash address).

I'm using vanilla Proton Experimental. If I understand right (although I haven't done any real research here), it redirects nvcuda.dll calls to the native libcuda.so; maybe there's something to that. My libcuda.so is installed by the libcuda1 package, and is version 535.104.12, the same as my Nvidia driver.

I do have the CUDA dev tools installed. That creates a dummy /usr/lib/x86_64-linux-gnu/stubs/libcuda.so (used to link programs at build-time without binding to a particular version), but that's not scanned by ld.so (it's not in /etc/ld.so.conf.d), so shouldn't matter for these purposes.

Did your log ever load vulkan-1.dll? That's the Vulkan loader, so I'd have assumed anything trying to use Vulkan would load it. I guess DXVK might not need it, if it's going through the Linux Vulkan loader instead.

I note that my run tries to load nvngx_dlssg.dll, but that's for frame generation. Since I have a 3090, my card doesn't support frame generation. Yours seems to load nvngx_dlssd.dll (ray reconstruction, I think). Maybe there's something to that. Does your run load nvngx_dlssg.dll at some point?

jp7677 commented 11 months ago

I'm so sorry I haven't replied! I had messed up my GitHub settings and didn't see your responses!

No worries and thanks for coming back!

I said that it crashes within _nvngx.dll because it listed the address of the crash as 0FBBD28C, and that's where it previously loaded _nvngx.dll (starting at 0FB90000, and the next DLL loaded was after the crash address).

Thanks and makes sense!

I'm using vanilla Proton Experimental. If I understand right (although I haven't done any real research here), it redirects nvcuda.dll calls to the native libcuda.so; maybe there's something to that. My libcuda.so is installed by the libcuda1 package, and is version 535.104.12, the same as my Nvidia driver.

Actually the nvcuda implementation in Proton's Wine is just a stub and should do nothing , see https://github.com/ValveSoftware/wine/tree/proton_8.0/dlls/nvcuda. Your log says builtin, so that looks good assuming that nothing altered your Proton installation. Though still strange to see the vulkan-1.dll lines in your logs at those places.

I do have the CUDA dev tools installed. That creates a dummy /usr/lib/x86_64-linux-gnu/stubs/libcuda.so (used to link programs at build-time without binding to a particular version), but that's not scanned by ld.so (it's not in /etc/ld.so.conf.d), so shouldn't matter for these purposes.

I haven't played with the toolkit, so unfortunately no experience with it.

Did your log ever load vulkan-1.dll? That's the Vulkan loader, so I'd have assumed anything trying to use Vulkan would load it. I guess DXVK might not need it, if it's going through the Linux Vulkan loader instead.

DXVK loads winevulkan.dll and DXVK-NVAPI gets its Vulkan entrypoint from DXVK. I do see vulkan-1.dll loaded directly, but very early during startup at places when DXVK loads wineopenxr.

I note that my run tries to load nvngx_dlssg.dll, but that's for frame generation. Since I have a 3090, my card doesn't support frame generation. Yours seems to load nvngx_dlssd.dll (ray reconstruction, I think). Maybe there's something to that. Does your run load nvngx_dlssg.dll at some point?

Yes, Streamline also tries all plugins on my system. I have the same GPU.

Steam currently attached Proton Hotfix with CP2077, I guess that doesn't changes anything for you? Another idea, do you have any Vulkan layers installed that might interfere?

jp7677 commented 10 months ago

Feel free to reopen if you want to come back to this. I’m fairly certain that this issue is outside of this project. If you ever find a reason though, would be cool though if you could post an explanation here.