suaefar / ryzen-test

Tools to reproduce randomly crashing processes under load on AMD Ryzen processors on Linux
GNU General Public License v3.0
224 stars 59 forks source link

Ubuntu LTS version? #25

Open Daniel451 opened 6 years ago

Daniel451 commented 6 years ago

Does someone have experience with Ubuntu LTS distributions like 16.04 for example? How fast do segfaults occur?

Since 17.04 is discontinued, most official mirrors do not offer downloading anymore. I could only find some US and China mirrors, which let's you download with something like 300 KB/s from Europe.

Additionally, I think it is also not the best practice to test with a discontinued OS.

So far I have tested Ubuntu 18.04 LTS (gcc7) on a pen drive and my regular Arch Linux (gcc8.1), but both fail after >= 1200s on all cores:

error: dereferencing pointer to incomplete type ‘struct ucontext’
       sc = (struct sigcontext *) (void *) &uc_->uc_mcontext;

...but this is because Removal of 'struct ucontext' in glibc >= 2.26. This also applies for Ubuntu 18.04.

Considering this, is there any suggested distribution besides 17.04?

btw: other builds, for example gcc-8.1, did not fail on Arch yet.

Daniel451 commented 6 years ago

EDIT: Forget about this. 17.04 and older does not boot with my R7 2700X / GTX 1080 / X470 mainboard, not even in terminal.

Can I just replace gcc7 with gcc8.1 for the build loop? This has already fixed the glibc stuff.

Or any other ideas how to solve that issue?

m-r-s commented 6 years ago

Probably you can replace gcc7 with gcc8.1...

However, we do not know if this combination of code, compiler and libraries will lead to the problematic command chain which triggers the bug. No one can guarantee that changing anything from the confirmed combination (stock Ubuntu 17.04. with gcc7) will be suitable or reliable to test for the the unexpected behavior.

There is no solution to this problem apart from finding another reliable way to trigger the bug, which requires hours of testing and many people with defective CPUs to confirm that is actually works.

Daniel451 commented 6 years ago

True. This is kind of problematic I guess...without applying newer kernels, for example, a R7 2700X CPU does not seem to properly work with Ubuntu 17.04 or older, thus one either has to do apply significant changes to Ubuntu 17.04 in order to get this running with Ryzen 2000 series CPUs or use some gcc configuration that differs to the one tested with this repository in order to get around the glibc change.

I'll test gcc8.1 now and report again. So far I did not encounter segfaults, but the 2700X + X470 mainboard is only 1 week old now...not much time to compile software yet.

suaefar commented 6 years ago

We don't have any "positive" reports from Ryzen 2XXX CPUs, yet. I can only hope (for AMD, the sake of competition, and us all) that you won't find this behavior in the second generation of Ryzen CPUs.

With the old/early Ryzen CPUs there were several ways to trigger the bug, mainly by compiling stuff in parallel. I just published this script as a reference, as it turned out to be very effective.

Just keep an eye on the output of dmesg when you are doing CPU and MEM-intensive stuff. If you see strange segfaults that do not appear on other CPUs, you have an indicator where to start.

Daniel451 commented 6 years ago

Ok, thanks for the reply!

So far I have compiled TensorFlow, OpenCV, ROS, GCC 8.1, and some own stuff without issues.

ryzen-kill.sh ran for nearly 5 hours now with GCC 8.1 and no issues yet.

EdgarMCR commented 6 years ago

I just managed to freeze a AMD Threadripper 2950X system with this test on Ubuntu 18.04 LTS

I do not get any error messages, just that the whole system freezes and only a hard reset will get it to reboot.

Thank you very much, I found it difficult to reproduce the failure before I found your test!

Edit: I was running this on an ASUS® PRIME X399-A motherboard with bios version 0807. I noticed that bios version 0808 was released on the 17th October and updating the bios resolved my issue.