madMAx43v3r / chia-plotter

Apache License 2.0
2.27k stars 662 forks source link

chia_plot hard locked Centos Stream/Z590 chipset/i9-10850K/64GB DDR4-2400 non/ecc #303

Open polkadotbabe opened 3 years ago

polkadotbabe commented 3 years ago

Installed and enabled kdump so we'll see.

Locked with both the standard Cent 4.18 kernel and 5.12.4 from elrepo. The standard chia plotter with 12 parallel processes was fine.

Turned SE/Linux off...

GRUB_CMDLINE_LINUX="nomodeset net.ifnames=0 biosdevname=0 mitigations=off crashkernel=auto scsi_mod.use_blk_mq=1 pcie_aspm=off rhgb

madMAx43v3r commented 3 years ago

hard to say what's going on, monitor RAM usage and make sure it's not running out, also try 256 buckets

polkadotbabe commented 3 years ago

[Jun11 17:10] process '/chia_plot' started with executable stack [Jun11 17:16] phase1/eval/6[1967]: segfault at 1 ip 00007efc55ec3b00 sp 00007efc55ec2e38 error 6 [ +0.000035] Code: 00 00 00 bb 47 a1 3c 8f 0b fc 00 00 000 00 47 fc 55 fc 7e 00 00 00 00 00 00 00 00 00 00 c3 3d 0d 5f fc 7e 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Thanks for responding so quickly, much appreciated :-) that's from dmesg -wH output

That time it crashed chia_plot but the system stayed up.

Restarted chia_plot with -b 256 and it hard-locked, rebooting to see the dump... ...nothing in /var/crash... hmmm....

digitalspaceport commented 3 years ago

[Jun11 17:10] process '/chia_plot' started with executable stack [Jun11 17:16] phase1/eval/6[1967]: segfault at 1 ip 00007efc55ec3b00 sp 00007efc55ec2e38 error 6 [ +0.000035] Code: 00 00 00 bb 47 a1 3c 8f 0b fc 00 00 000 00 47 fc 55 fc 7e 00 00 00 00 00 00 00 00 00 00 c3 3d 0d 5f fc 7e 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Thanks for responding so quickly, much appreciated :-) that's from dmesg -wH output

That time it crashed chia_plot but the system stayed up.

Restarted chia_plot with -b 256 and it hard-locked, rebooting to see the dump... ...nothing in /var/crash... hmmm....

Try these build instructions and let me know. Validated on Cent7

https://gist.github.com/jollyjollyjolly/d8904efda4d5997a2f0e9caf31cff1c3

polkadotbabe commented 3 years ago

Stream doesn't have gmp-static anymore, linking error still fails to build with:

/bin/ar: /usr/lib64/libgmp.so: File format not recognized

But your link is basically how I built on 7 to create the binary...

polkadotbabe commented 3 years ago

The bios is incorrectly identifying the memory as ddr4-2133 CL15, here's what it actually is: Memory Speed (MHz) DDR4-3600, PC4-28800, PC4-28800, CAS Latency 16, Memory Latency Timings 16-19-19-39

@madMAx43v3r should I set the bios to the exact chip specs, or leave it on "auto".

Nothing in /var/crash as it wasn't rebooted with crashkernel=auto...

polkadotbabe commented 3 years ago

Crashed again. /var/crash didn't capture anything, despite kdump is installed, crashkernel=auto is in grub, but kdump won't load, throws a no memory error: kdumpctl[18715]: kdump: No memory reserved for crash kernel

Updated the bios, asrock z590 phantom gaming 4, intel i9-10850, g.skill ddr4-3600, we'll see if that fixes it.

Trying to figure out why kdump isn't working on stream, was always easy on Cent7/RHEL...

Definitely open to ideas if anyone sees this! :-)

digitalspaceport commented 3 years ago

Can always recommend you return to rhel7 for time being. Stream is a mess and rocky isnt primetime (yet) -------- Original message --------From: "PolkadotBABE.Com" @.> Date: 6/12/21 5:45 PM (GMT-06:00) To: madMAx43v3r/chia-plotter @.> Cc: Jerod Moore @.>, Comment @.> Subject: Re: [madMAx43v3r/chia-plotter] chia_plot hard locked Centos Stream/Z590 chipset/i9-10850K/64GB DDR4-2400 non/ecc (#303) Crashed again. /var/crash didn't capture anything, despite kdump is installed, crashkernel=auto is in grub, but kdump won't load, throws a no memory error: kdumpctl[18715]: kdump: No memory reserved for crash kernel Updated the bios, asrock z590 phantom gaming 4, intel i9-10850, g.skill ddr4-3600, we'll see if that fixes it. Trying to figure out why kdump isn't working on stream, was always easy on Cent7/RHEL... Definitely open to ideas if anyone sees this! :-)

—You are receiving this because you commented.Reply to this email directly, view it on GitHub, or unsubscribe.

polkadotbabe commented 3 years ago

"Can always recommend you return to rhel7 for time being. Stream is a mess and rocky isnt primetime (yet)"

Nicely (and politely!) put ;-) Seems like a consumer-grade chip/board error that won't be solved downgrading, although kdump would work and chia_plot would compile without all this drama, so...

polkadotbabe commented 3 years ago

FWIW, after the ASRock bios update, it's stopped crashing.

I set the bios to the exact memory specs of the chips, Vs. the much slower rate it recognized. Not sure the disparity, but the bios update also mentioned some memory handling upgrades, so.... ...dunno. @madMAx43v3r Seems bleeding-edge, consumer-grade-memory controller related, and the heavy usage of chia_plot exposing that.

diegotanel commented 3 years ago

@polkadotbabe Hello.

Specs: Ubuntu 20.04 Motherboard: b460m Aorus Elite with the last firmware Intel I5 10400 (6c/12Th) 32 Gb mem Tmp drive: 2x1 TB NVMe WD 750 black (RAID 0) Dst drive: HDD 10 TB Seagate

Mad Max plotter hangs with no output in the registry. I did a lot of tests. Can you help me with the memory settings? It is the last test I would do. Thank you

polkadotbabe commented 3 years ago

First make sure you're on the latest: 1) sudo apt-get update && sudo apt-get -y dist-update

If there was a kernel upgrade, reboot into it (just reboot).

Once running leave:

2) dmesg -wH

running in a terminal, it should capture a crash.

3) Are you certain the motherboard's bios is the latest? Double check.

4) Reset to default EUFI bios settings.

5) Turn off anything not needed. Sound, serial port, etc.

6) Don't overclock anything, but you can set the CPU to "turbo performance" to leave the turbo scaling at max.

7) IMPORTANT FOR ME: Take a picture of your RAM's stickers. Compare that to the purchase receipt. In the bios, set the speed and CAS latency settings to the EXACT parameters of your actual RAM. That solved it for me. Also stay current on the builds, in the chia-plotter dir:

8) git pull origin master

9) ./make_devel.sh

10) cp build/chia_plot /to/wherever/you/use/it

Good luck!

diegotanel commented 3 years ago

Thank you for your fast response. I will test your steps and report the results. Thank you so much!

diegotanel commented 3 years ago

@polkadotbabe , hello!! I found the problem and have no relation to the memories. When I ran chia_plotter mad max, I added the ampersand sign to do the background process. Like this: ./build/chiaplot -p $$$ -f $$$ -n -1 -r 12 -t /mnt/ssdtemp/ -d /media/dtanel/hdd1plot/ >> $(date '+%Y-%m-%d%H%M%S').log & And with that, the process dies at random. I don't know why, but that's the problem. I appreciate for your help. Thank you.

polkadotbabe commented 3 years ago

forking to the background is unrelated, likely you're missing some startup flags (-b 256 for one), chia_plot will tell you, so just run it with everything before the >> and see what it says....