tari-project / universe

Other
7 stars 25 forks source link

Enable Crash Dumps for Tari /* Heap Corruption crashes missed on three machines running i9-14900KS (Intel GPU on-die) #638

Open uforiaio opened 3 weeks ago

uforiaio commented 3 weeks ago

All three of my machines running i9-14900KS's had faults last night. I did notice before I went to bed that the 4090's and i9-14900KS onboard Intel GPU's were working now with Tari. The exception with ntdll.dll is usually heap corruption, but I ran mem diags this morning on all three machines /w no errors detected. All of my machines ran burn-ins for a week with max gpu/mem/cpu, so I'm familiar with this error since I run i9-14900KS's. Intel has had microcode issues that have been ongoing with these pieces of @#@$ since they were released.

My main machine has windbg registered as the debugger, but is running a i9-14900K, which didn't have a fault. I installed windbg on all of the systems now. Not sure if it an issue with the mix of 4090's and Intel GPU's on-die, or a memory corruption issue with multiple GPU's in a system.

Tari is not registered to write application dumps. Three out of my four machines had an application exception last night. Noticed it wasn't registered since on my main machine. I've put windbg on all of my machines now, but here is the eventid.

Faulting application name: Tari Universe.exe, version: 0.4.5.0, time stamp: 0x66f44133 Faulting module name: ntdll.dll, version: 10.0.22621.4249, time stamp: 0x4e293ad7 Exception code: 0xc0000374 Fault offset: 0x000000000010ca29 Faulting process id: 0x0x5D84 Faulting application start time: 0x0x1DB104AD0298B27 Faulting application path: C:\Program Files\Tari Universe\Tari Universe.exe Faulting module path: C:\Windows\SYSTEM32\ntdll.dll Report Id: 3e93dfd8-3a5b-46a9-83c7-2cd799ff2e44 Faulting package full name: Faulting package-relative application ID:


.reg file renamed to .txt as an example /* I removed my username and replaced it with $username.

tari-miniidump.txt

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\Windows Error Reporting\LocalDumps\Tari Universe.exe] "DumpFolder"="C:\Users\$username\AppData\Local\com.tari.universe\mini-dump"


OS & Version: Windows 11 Version 23H2 (OS Build 22631.4169) GPUs: Nvidia RTX 4090 CPU: i9-14900KS GPU Drivers: Nvidia and AMD (Tried newest drivers, as well as two older drivers) Browser & Version: All browsers since they use GPU Smartphone: n/a

uforiaio commented 3 weeks ago

The crashes were only on my three machines with i9-14900KS's and 4090's. I log all temps, performance stats, etc, nothing unusual. The i9-14900KS has an Intel GPU on-die. The one machine with a i9-14900K and AMD 7900 XTX do not have the same issue.

Based on ongoing issues with microcode on the i9-14900KS, I am wondering if the CPU mining with the GPU on-die is the problem.

I've not had a reoccurrence since v0.4.6 dropped, so I have set the mini-dump for Tari. If it happens again, I will be able to get more information from windbg.