ishitatsuyuki / acominer

An experimental ETH miner powered by Vulkan
Apache License 2.0
17 stars 6 forks source link

Acominer with Mesa fork causes RX 480 graphics to crash unrecoverably. #9

Open happysmash27 opened 2 years ago

happysmash27 commented 2 years ago

This has happened twice now in the past two weeks, and both times the GPU fails to recover with these four messages:

[442366.409593] amdgpu:
                 failed to send message 200 ret is 0
[442368.471903] amdgpu:
                 last message was failed ret is 0
[442370.534578] amdgpu:
                 failed to send message 201 ret is 0
[442372.609531] amdgpu:
                 last message was failed ret is 0

Over and over again, about two seconds apart.

It is impossible to recover from this this without a hard reset. GPU reset fails, shutdown does not commence, magic sysrq does not reboot, and even the reset button on my case does not work. I have to actually hold the power button to reset it manually. This is very problematic, as it causes downtime for my mining pool and several servers.

Apparently dmesg only sends lines up to a certain amount, so the start of the error the first time on January 15th is lost. The second time, however, I did manage to catch it:

[535735.177927] [drm:amdgpu_job_timedout] *ERROR* ring gfx timeout, signaled seq=46058293, emitted seq=46058295
[535735.177942] [drm:amdgpu_job_timedout] *ERROR* Process information: process blender-2.93 pid 437 thread blender-2.:cs0 pid 2249
[535735.177947] amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
[535735.375188] amdgpu 0000:03:00.0: amdgpu: BACO reset
[535735.562144] amdgpu 0000:03:00.0: amdgpu: GPU reset succeeded, trying to resume
[535735.563373] [drm] PCIE GART of 256M enabled (table at 0x000000F400500000).
[535735.563389] [drm] VRAM is lost due to GPU reset!
[535737.604174] amdgpu:
                 failed to send message 200 ret is 0
[535740.071713] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[535741.728737] amdgpu:
                 last message was failed ret is 0
[535743.791821] amdgpu:
                 failed to send message 100 ret is 0
[535745.538355] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[535745.539055] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[535745.899508] amdgpu:
                 last message was failed ret is 0
[535745.902206] amdgpu: SMU Firmware start failed!
[535745.902209] amdgpu: Failed to load SMU ucode.
[535745.902210] amdgpu: fw load failed
[535745.902211] amdgpu: smu firmware loading failed
[535745.902214] [drm] Skip scheduling IBs!
[535745.902216] [drm] Skip scheduling IBs!
[535745.902225] [drm] Skip scheduling IBs!
[535745.902226] [drm] Skip scheduling IBs!
[535745.902242] amdgpu 0000:03:00.0: amdgpu: GPU reset(2) failed
[535745.902278] [drm] Skip scheduling IBs!
[535745.902288] [drm] Skip scheduling IBs!
[535745.902295] [drm] Skip scheduling IBs!
[535745.902299] [drm] Skip scheduling IBs!
[535745.902308] [drm] Skip scheduling IBs!
[535745.902310] [drm] Skip scheduling IBs!
[535745.902320] [drm] Skip scheduling IBs!
[535745.902331] [drm] Skip scheduling IBs!
[535745.902343] [drm] Skip scheduling IBs!
[535745.902349] [drm] Skip scheduling IBs!
[535745.902353] [drm] Skip scheduling IBs!
[535745.902357] [drm] Skip scheduling IBs!
[535745.902360] [drm] Skip scheduling IBs!
[535745.902366] [drm] Skip scheduling IBs!
[535745.902371] [drm] Skip scheduling IBs!
[535745.902375] [drm] Skip scheduling IBs!
[535745.902380] [drm] Skip scheduling IBs!
[535745.902384] [drm] Skip scheduling IBs!
[535745.902388] [drm] Skip scheduling IBs!
[535745.902391] [drm] Skip scheduling IBs!
[535745.902396] [drm] Skip scheduling IBs!
[535745.902401] [drm] Skip scheduling IBs!
[535745.902404] [drm] Skip scheduling IBs!
[535745.902408] [drm] Skip scheduling IBs!
[535745.902411] [drm] Skip scheduling IBs!
[535745.902417] [drm] Skip scheduling IBs!
[535745.902421] [drm] Skip scheduling IBs!
[535745.902426] [drm] Skip scheduling IBs!
[535745.902430] [drm] Skip scheduling IBs!
[535745.902433] [drm] Skip scheduling IBs!
[535745.902436] [drm] Skip scheduling IBs!
[535745.902440] [drm] Skip scheduling IBs!
[535745.902443] [drm] Skip scheduling IBs!
[535745.902446] [drm] Skip scheduling IBs!
[535745.902449] [drm] Skip scheduling IBs!
[535745.902453] [drm] Skip scheduling IBs!
[535745.902456] [drm] Skip scheduling IBs!
[535745.902459] [drm] Skip scheduling IBs!
[535745.902462] [drm] Skip scheduling IBs!
[535745.902466] [drm] Skip scheduling IBs!
[535745.902469] [drm] Skip scheduling IBs!
[535745.902472] [drm] Skip scheduling IBs!
[535745.902476] [drm] Skip scheduling IBs!
[535745.902480] [drm] Skip scheduling IBs!
[535745.902483] [drm] Skip scheduling IBs!
[535745.902488] [drm] Skip scheduling IBs!
[535745.902492] [drm] Skip scheduling IBs!
[535745.902495] [drm] Skip scheduling IBs!
[535745.902499] [drm] Skip scheduling IBs!
[535745.902503] [drm] Skip scheduling IBs!
[535745.902506] [drm] Skip scheduling IBs!
[535745.902738] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[535745.903705] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[535745.903871] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[535745.909776] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[535745.909788] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[535745.917157] amdgpu 0000:03:00.0: amdgpu: GPU reset end with ret = -22
[535747.965685] amdgpu:
                 failed to send message 201 ret is 0
[535752.092718] amdgpu:
                 last message was failed ret is 0
[535754.156165] amdgpu:
                 failed to send message 282 ret is 0
[535756.159935] [drm:amdgpu_job_timedout] *ERROR* ring sdma0 timeout, signaled seq=4008009, emitted seq=4008011
[535756.159947] [drm:amdgpu_job_timedout] *ERROR* Process information: process  pid 0 thread  pid 0
[535756.159953] amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
[535756.220125] amdgpu:
                 last message was failed ret is 0
[535758.287806] amdgpu:
                 failed to send message 170 ret is 0
[535758.289144] [drm:amdgpu_job_timedout] *ERROR* ring sdma1 timeout, signaled seq=634699, emitted seq=634701
[535758.289152] [drm:amdgpu_job_timedout] *ERROR* Process information: process  pid 0 thread  pid 0
[535758.289157] amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
[535758.289159] amdgpu 0000:03:00.0: amdgpu: Bailing on TDR for s_job:9af4b, as another already in progress
[535758.643706] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[535761.295930] amdgpu:
                 last message was failed ret is 0
[535763.362051] amdgpu:
                 failed to send message 171 ret is 0
[535765.436241] amdgpu:
                 last message was failed ret is 0
[535767.501770] amdgpu:
                 failed to send message 200 ret is 0
[535769.566274] amdgpu:
                 last message was failed ret is 0
[535771.630977] amdgpu:
                 failed to send message 201 ret is 0
[535775.758325] amdgpu:
                 last message was failed ret is 0
[535777.822903] amdgpu:
                 failed to send message 261 ret is 0
[535779.886624] amdgpu:
                 last message was failed ret is 0
[535781.950258] amdgpu:
                 failed to send message 200 ret is 0
[535784.014566] amdgpu:
                 last message was failed ret is 0
[535786.076908] amdgpu:
                 failed to send message 201 ret is 0
[535790.207115] amdgpu:
                 last message was failed ret is 0
[535792.270633] amdgpu:
                 failed to send message 261 ret is 0
[535794.334658] amdgpu:
                 last message was failed ret is 0
[535796.398011] amdgpu:
                 failed to send message 200 ret is 0
[535798.462346] amdgpu:
                 last message was failed ret is 0
[535800.529184] amdgpu:
                 failed to send message 201 ret is 0
[535804.656539] amdgpu:
                 last message was failed ret is 0
[535806.720778] amdgpu:
                 failed to send message 261 ret is 0
[535810.849788] amdgpu:
                 last message was failed ret is 0
[535812.912689] amdgpu:
                 failed to send message 261 ret is 0
[535814.976085] amdgpu:
                 last message was failed ret is 0

If you know of anywhere better to send this issue, I would really appreciate that as well. I can get similar issues with some other Ethereum miners and with SteamVR, but am not sure if I can put this in a bug report to Mesa, since it uses a custom fork rather than the official one.

This seems to happen more often when I am doing something else in 3D. The first time, was when I launched the miner before Cities: Skylines had completely closed, which is relatively understandable since CS uses a crazy amount of VRAM. The second time, however, I was only zooming in in Blender, with Google Earth running far in the background. I thought I could avoid the crashes by not running anything GPU-intensive, but it appears that this is not the case.

ishitatsuyuki commented 2 years ago

The error is a generic GPU timeout, meaning that something got corrupted and the GPU didn't return results as expected.

The typical response to this is a GPU reset, however it seems that in your case the reset has also failed.

Since you run an atypical PCIe configuration, I think maybe GPU reset isn't working at all for you? Try the following command and see if it gives the same error spam (WARNING: GPU reset will kill all your graphical workload so save everything before doing this)

sudo cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover

For the timeout, I honestly have no idea. It's possible that Mesa is depending on PCIe atomics in some way, which does not work in your configuration as you mentioned before. There's a small chance that this is an acominer bug, although I don't think so given it happens with other miners too and no other users have reported such hangs.

happysmash27 commented 2 years ago

Thank you so much for the quick and knowledgeable response!

If I reset right now when nothing is happening, resetting works acceptably, with a few glitches at first but overall success. My dmesg contains the following:

[ 5258.602302] amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
[ 5258.874321] amdgpu 0000:03:00.0: amdgpu: BACO reset
[ 5259.060660] amdgpu 0000:03:00.0: amdgpu: GPU reset succeeded, trying to resume
[ 5259.061938] [drm] PCIE GART of 256M enabled (table at 0x000000F400500000).
[ 5259.061955] [drm] VRAM is lost due to GPU reset!
[ 5259.125771] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5259.332365] [drm] UVD and UVD ENC initialized successfully.
[ 5259.433361] [drm] VCE initialized successfully.
[ 5259.439057] amdgpu 0000:03:00.0: amdgpu: recover vram bo from shadow start
[ 5259.439075] amdgpu 0000:03:00.0: amdgpu: recover vram bo from shadow done
[ 5259.439093] amdgpu 0000:03:00.0: amdgpu: GPU reset(1) succeeded!
[ 5259.439159] [drm] Skip scheduling IBs!
[ 5260.124776] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5260.147738] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5260.299254] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5261.125595] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5261.800112] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5262.125534] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5262.125609] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5263.125618] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5263.300595] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5264.801740] amdgpu_cs_ioctl: 1 callbacks suppressed
[ 5264.801744] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5265.125582] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5266.124915] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5266.303067] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5267.126019] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5267.803650] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5268.125449] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5268.125524] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5269.009588] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5269.143642] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5270.125148] amdgpu_cs_ioctl: 2 callbacks suppressed
[ 5270.125153] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5270.805663] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5271.125429] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5272.125532] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5272.306532] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5273.125071] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5273.807558] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5274.125146] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5275.125142] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5275.308223] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5276.125679] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5276.809169] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5277.125857] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5278.126085] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5278.310105] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5279.125354] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5279.125448] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5279.811053] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5280.125636] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5281.125751] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5281.312006] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5282.124939] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5282.812832] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5283.125471] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5284.124768] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5284.313666] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5285.124461] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5285.148289] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5285.814552] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5287.125286] amdgpu_cs_ioctl: 1 callbacks suppressed
[ 5287.125291] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5287.315461] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5288.125452] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5288.816122] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5289.125110] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5290.125597] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5290.316520] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5290.316635] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5291.125326] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5291.817888] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5293.125287] amdgpu_cs_ioctl: 1 callbacks suppressed
[ 5293.125291] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5293.318223] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5294.125071] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5294.818622] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5295.124570] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5296.125003] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5296.318969] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5297.124711] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5297.819455] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5298.124403] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5299.124951] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5299.319941] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5300.124731] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5300.820387] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5301.124320] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5302.125889] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5302.320854] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5302.320975] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5303.124802] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5303.821449] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5305.125874] amdgpu_cs_ioctl: 1 callbacks suppressed
[ 5305.125878] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5305.321793] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5306.124635] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5306.822309] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5307.124206] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5307.148012] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5308.125322] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5308.323363] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5309.125863] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5309.824585] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5311.124991] amdgpu_cs_ioctl: 1 callbacks suppressed
[ 5311.124995] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5311.324901] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5312.124817] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5312.825613] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5313.124527] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5314.125054] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5314.125105] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5314.325921] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5315.125015] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5315.826768] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5317.125248] amdgpu_cs_ioctl: 1 callbacks suppressed
[ 5317.125252] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5317.327114] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5318.125262] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5318.828042] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5319.124928] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5320.125559] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5320.328389] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5321.125513] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5321.829222] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5322.125106] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5323.125787] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5323.329650] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5324.124727] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5324.830579] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5325.124706] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5326.124244] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5326.124285] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5326.149110] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5326.330613] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!
[ 5327.124851] [drm:amdgpu_cs_ioctl] *ERROR* Failed to initialize parser -125!

There have also been one or two times when SteamVR crashed things and the manual reset worked successfully as well. I am aware of this command and usually try to invoke it when things go wrong, but although it does not work 80+% of the time, it does work occasionally. I have also had automatic resets work at least once.

Would you happen to know what amdgpu_gpu_recover printing "-11" might mean? When I tried to reset the GPU during the first acominer-related crash, it always printed that when I tried to recover but I can find no results about what this actually means online.

ishitatsuyuki commented 2 years ago

Hmm, thank you, and I see that GPU reset is working fine (Failed to initialize parser -125 just means that the applications need to be restarted because their context are lost).

Would you happen to know what amdgpu_gpu_recover printing "-11" might mean? When I tried to reset the GPU during the first acominer-related crash, it always printed that when I tried to recover but I can find no results about what this actually means online.

It probably means that a GPU reset is already in progress, I don't know if -11 is actually an errno but if it's errno then it would mean "Resource temporarily unavailable". Since the GPU reset fails it probably will never end, bringing the system into a hanging state.

I honestly don't have an idea how this can be solved, but here's an attempt anyway. https://github.com/ishitatsuyuki/acominer/actions/runs/1733336306

ishitatsuyuki commented 2 years ago

@happysmash27 Did you have a chance to try out the experimental build (ishitatsuyuki/acominer/actions/runs/1733336306)? No need to hurry, just let me know if it didn't work.