suaefar / ryzen-test

Tools to reproduce randomly crashing processes under load on AMD Ryzen processors on Linux
GNU General Public License v3.0
224 stars 59 forks source link

Clean and unmount and Fedora/RHEL support #10

Closed lzap closed 7 years ago

lzap commented 7 years ago

Hey there, this adds Fedora / RHEL support (installation of dependencies).

Also this add cleanup procedure when script is terminated. Journal sub-process needs to be executed in a different working directory, otherwise the ramdisk won't be unmounted (device is busy error). To find what else can cause this:

lsof | grep /mnt/ramdisk

Unfortunately, I can't find anything that prevents from these errors on my system, SMT disabled, randomized binaries disabled and nothing really helps. Sigh.

suaefar commented 7 years ago

Thank you for your input!

I already decided not to support other distros as I don't have time to test the patches and two attempts (see commits in master) already failed.

Regarding the cleanup, I think it is desirable to have the build logs available after stopping the script. Not all segfaults are reported in the kernel log as some programs trap them; these errors only appear in the build log. The trick is that just by rebooting (which is highly recommendable after some (random) processes died) the ramdisk with all its contents disappears.

The script is thought as run once and reboot. I don't know if it makes much sense to build in more features.

I RMA'd my CPU and got a new one that still has to segfault. Got now 16h of accumulated testing without segfault (which I never achieved with the old one).

lzap commented 7 years ago

I hereby offer you to test all Fedora patches.

I made the cleanup optional, defaults to turned off so what you want is preserved.

I also fixed USE_RAMDISK variable, there is no true/false in Bash this was always true statement.

Rebased. Still hope you will like that.

suaefar commented 7 years ago

Are you willing to test all future patches with fedora? The problem is that I cannot test any changes on fedora. I would add you as the contact/maintainer for fedora. Any errors related to fedora will be your department ;)

Does the script throw an error on fedora due to not finding "apt"? (I suppose: yes) Could you solve this without error messages and in a way that other distributions can be easily added (if someone volunteers as a maintainer)

My bash (4.4.7) interprets true and false correctly: marc@snark:~$ true && echo yes || echo no yes marc@snark:~$ false && echo yes || echo no no

lzap commented 7 years ago

Yes, I can do testing if needed. I was thinking even more - creating small livecd that will do the script automatically.

The script does print an error on Fedora, yeah. I thought it is not big deal, can fix this if you want.

Your Bash does not interpret true or false, you call /usr/bin/true program here, these do exist in most Linux distros, but in your script you do SOMETHING=true which results in a string containing "true". It is not important, just to make things clear here.

suaefar commented 7 years ago

Yes, I think it would be a good idea to avoid having to many error messages. It would be great if you could fix it.

A livecd that automatically runs the script and tells you if you are affected would be great. However, that would probably require some "grepping" in the logs to match messages that reliably indicate the bug.

Okay, but "true" and "false" works because the strings are evaluated in the if condition. I would like to keep it with "true" and "false".

SOMETHING=true if $SOMETHING; then echo yes; else echo no; fi yes SOMETHING=false if $SOMETHING; then echo yes; else echo no; fi no SOMETHING=a if $SOMETHING; then echo yes; else echo no; fi a: Befehl nicht gefunden. no

lzap commented 7 years ago

Fixed, you are right I haven't realized this is evaluated in shell context not in test context.

suaefar commented 7 years ago

Looks good! Did you test it on Fedora?

lzap commented 6 years ago

Absolutely.