Open Motophan opened 1 year ago
Since this is partial RAM mode (I assume), I would blame the NVMes first.
CTX SWITCH TIMEOUT
does seem like it could be the cause though...
It could also be the other way around though, NVMes causing the CTX SWITCH TIMEOUT
.
I'm getting these freezes on my dev machine as well, with a GIGABYTE GP-GSM2NE3100TNTD
I can source a new enterprise NVME, I am juuuuuuuust above the tbw on a 980 pro 2TB (I am 1400tbw on a 1200tbw drive)
Should I close this or do you know of anything I can do to isolate this issue? Or leave open until I can source a good high endurance nvme.
Also, do you have any recommendations for drives? I was going to get a pcie u2 adapter and ~ $120 used high endurance u2 intel dc drive.
System specs: rtx 4080 samsung 980 pro 2TB (under TBW) 128gb ram amd 5950x arch linux, desktop env is xfce
nvidia-smi output
What happens: in 20 minutes or 20 hours the system will become unstable. The system will freeze for >1 minute at a time, sometimes once every few minutes, sometimes once every few hours. Sometimes dmesg will output the xid 109 error (which is undocumented by nvidia) sometimes dmesg will not output any error at all (but the system will still freeze).
Example plotting command