madMAx43v3r / chia-gigahorse

220 stars 32 forks source link

chia_rs.pyd crash when farming C9 plots #217

Closed Kruger1981 closed 8 months ago

Kruger1981 commented 9 months ago

Faulting application name: chia.exe, version: 0.0.0.0, time stamp: 0x64a69e81 Faulting module name: chia_rs.pyd, version: 0.0.0.0, time stamp: 0x6446ad3d Exception code: 0xc0000409 Fault offset: 0x000000000033a340 Faulting process id: 0x0x17CC Faulting application start time: 0x0x1D9FC0FDF74B4F8 Faulting application path: C:\chia-gigahorse-farmer\chia.exe Faulting module path: C:\CHIA-G~1\chia_rs\chia_rs.pyd Report Id: 2e4249c1-e198-4e53-be71-765d0bdd6386 Faulting package full name: Faulting package-relative application ID:

Getting this issue when farming C9 plots tested in Windows 10 and 11 both fresh installs Nvidia RTX 2080 Super 8GB - Not overclocked CPU Ryzen 3600 6 Core - 12 thread -Not overclocked 32GB Ram @2666mhz (also tested at 3000mhz) Network Environment 1gbps cabled network C9 Plots are on a remote Harvester connecting to the Remote Compute on Farmer (above specs) Tested 3 or 4 different Nvidia Drivers (all versions removed with DDU prior to install another version)

GPU vRAM usage never goes over 6.8GB always has over 1GB free

Also ran Mem test no issues found on RAM

Running GigaHorse 1.8.2 (Giga14) Chia GUI 1.8.2

Chia Log:

File "C:\CHIA-G~1.\chia\full_node\full_node.py", line 445, in _handle_one_transaction File "C:\CHIA-G~1.\chia\full_node\full_node.py", line 2218, in add_transaction File "C:\CHIA-G~1.\chia\full_node\mempool_manager.py", line 284, in pre_validate_spendbundle File "C:\CHIA-G~1.\asyncio\base_events.py", line 821, in run_in_executor File "C:\CHIA-G~1.\concurrent\futures\process.py", line 715, in submit concurrent.futures.process.BrokenProcessPool: A child process terminated abruptly, the process pool is not usable anymore

2023-10-11T05:26:48.997 full_node chia.full_node.full_node: ERROR Error in _handle_one_transaction, closing: Traceback (most recent call last): File "C:\CHIA-G~1.\chia\full_node\full_node.py", line 445, in _handle_one_transaction File "C:\CHIA-G~1.\chia\full_node\full_node.py", line 2218, in add_transaction File "C:\CHIA-G~1.\chia\full_node\mempool_manager.py", line 284, in pre_validate_spendbundle File "C:\CHIA-G~1.\asyncio\base_events.py", line 821, in run_in_executor File "C:\CHIA-G~1.\concurrent\futures\process.py", line 715, in submit concurrent.futures.process.BrokenProcessPool: A child process terminated abruptly, the process pool is not usable anymore

2023-10-11T05:26:48.997 full_node chia.full_node.full_node: ERROR Error in _handle_one_transaction, closing: Traceback (most recent call last): File "C:\CHIA-G~1.\chia\full_node\full_node.py", line 445, in _handle_one_transaction File "C:\CHIA-G~1.\chia\full_node\full_node.py", line 2218, in add_transaction File "C:\CHIA-G~1.\chia\full_node\mempool_manager.py", line 284, in pre_validate_spendbundle File "C:\CHIA-G~1.\asyncio\base_events.py", line 821, in run_in_executor File "C:\CHIA-G~1.\concurrent\futures\process.py", line 715, in submit concurrent.futures.process.BrokenProcessPool: A child process terminated abruptly, the process pool is not usable anymore

2023-10-11T05:26:49.168 full_node full_node_server : INFO Connection closed: 93.94.243.190, node id: b8717206b1f0b27911edd8bb93beb66720624c3ebdc47fdbdc7890d9bcf8f0a3 2023-10-11T05:26:49.168 full_node chia.full_node.full_node: INFO peer disconnected PeerInfo(_ip=IPv4Address('93.94.243.190'), _port=8444) 2023-10-11T05:26:49.168 full_node full_node_server : ERROR Exception: Failed to fetch block 4352820 from PeerInfo(_ip=IPv4Address('93.94.243.190'), _port=8444), timed out, PeerInfo(_ip=IPv4Address('93.94.243.190'), _port=8444). Traceback (most recent call last): File "C:\CHIA-G~1.\chia\server\ws_connection.py", line 398, in wrapped_coroutine File "C:\CHIA-G~1.\chia\full_node\full_node_api.py", line 136, in new_peak File "C:\CHIA-G~1.\chia\full_node\full_node.py", line 724, in new_peak File "C:\CHIA-G~1.\chia\full_node\full_node.py", line 642, in short_sync_backtrack ValueError: Failed to fetch block 4352820 from PeerInfo(_ip=IPv4Address('93.94.243.190'), _port=8444), timed out

Coincidence or not, issue started right after the chia official version 2.1.0 was released, I did not update, but not sure if something on the network has change that is preventing C9 plots created with @madMAx43v3r CUDA Plotter to be farmed.

Any one has this or similar issue?

Any suggestions?

Removed C9 Plots from the Harvester's HDDs (only had a handful spread on 3 hdds) issue no longer happens.

Any help, suggestions, tips, ideas, would be greatly appreciated.

madMAx43v3r commented 9 months ago

Did it work before? I still suspect out of VRAM or out of RAM condition.

Kruger1981 commented 9 months ago

Hi Max, This was the first time I tested C9 plots. I also though about a memory leak or something like that, but my understanding of these things is quite limited. RAM usage never goes over 8 or 9gb out of 32gb and vRam is steady at 6.8 leaving just over 1gb free. I do have another machine we're I could test with: Ryzen 5950x 128gb RAM RTX 3090 But it seems a bit overkill to use it as farmer only 🤣

If you have any suggestions please let me know. Once you release the new compression levels maybe requirements go up and I move the farmer to the more capable machine... and see how it goes.

Thank you for looking into this.

madMAx43v3r commented 9 months ago

VRAM fragmentation could be a problem, just 1 GB free can still be a problem. Basically on GPU when you want to allocate say 2GB of VRAM there needs to be a continuous free space of 2 GB somewhere. However this should not result in a crash, but just a error message.

madMAx43v3r commented 9 months ago

Can you check dmesg why it killed the process?

Kruger1981 commented 9 months ago

Hi Max, Regarding the vRAM usage, starts off at under 3gb (as it starts with c7 and c8 plots) only when a C9 is "looked at" the vRAM usage goes to 6.8GB. dmesg is for Linux I am on Windows, for windows I found PsLogs tool but don't know how to use it, will have a look later on.

madMAx43v3r commented 9 months ago

yeah it dynamically reallocates more VRAM when needed, which can be an issue if you're right at the limit.

Kruger1981 commented 9 months ago

Hum... I see. Unfortunately the 3090 doesn't fit in this case. Will test moving the farmer onto the othe pc over the weekend and see how it goes.