suaefar / ryzen-test

Tools to reproduce randomly crashing processes under load on AMD Ryzen processors on Linux
GNU General Public License v3.0
224 stars 59 forks source link

Will this bug can be fixed by gcc update or kernel patch? #21

Closed ZhengshuaiPENG closed 6 years ago

ZhengshuaiPENG commented 6 years ago

Can this bug be fixed by gcc update? It‘s too trouble to RMA my ryzen cpu.

suaefar commented 6 years ago

Short answer: We don't know, but probably not.

AMD has been silent for over half a year. Hardly any official information is available. For the time being, RMA is the only solution to get a reliable system.

suaefar commented 6 years ago

@ZhengshuaiPENG Please ask AMD and tell us if you got any information on that.

ZhengshuaiPENG commented 6 years ago

@suaefar I contacted AMD and I am waiting for their response. I will update this when I got some information about that.

ZhengshuaiPENG commented 6 years ago

@suaefar I just explained the situation to AMD, they just returned me back a replacement CPU which produced in 33 week. Tested kill-ryzen in ubuntu 17.10. (I can't enter into live system of 17.04) I have 16G G-Skill 3200 C14 RAM, still got build failed , but I haven't seen any segment in output. Also tried kill-ryzen 4 4, also failed

ZhengshuaiPENG commented 6 years ago

checked with log file, seems that the errors occurs due to code level or some thing like this (Error: dereferencing pointer to incomplete type...) and no segfault in logs. So that's means this new CPU should be non faulty?

suaefar commented 6 years ago

The script wont work with Ubuntu 17.10 due to a known bug/incompatibility. You can try to replace the gcc version of 7.1 with 7.2 in the souce code. However, it is not clear if this combination triggers the segfaults.

Oxalin commented 6 years ago

As I previously stated elsewhere: I was able to trigger the segfault (on a faulty CPU) with GCC 7.2 on ArchLinux. @ZhengshuaiPENG The error you are seeing is mostly caused by a known incompatibility between GCC 7.1 and glib 2.26 (see my comment in issue #6 )

ZhengshuaiPENG commented 6 years ago

@Oxalin My new CPU batch number is 1733SUS, it seems that the most part of segfault RMA CPU is the same batch number as mine. So I suppose this cpu should be fine. And as I can't enter into 17.04 live system, I won't test it in Ubuntu. But I have Arch installed in my system, so could u tell me how to test this script on Arch? thanks

suaefar commented 6 years ago

Remember that this script does not calculate anything deterministic. It is likely to trigger a hardware bug under certain conditions (Ubuntu 17.04!) With Arch it might, or not, have this property. To be sure you will need to find someone confirming that it triggers the bug with the same software configuration, which under Arch is subject to change every minute (rolling disto).