jirka-h / haveged

Entropy daemon ![Continuous Integration](https://github.com/jirka-h/haveged/workflows/Continuous%20Integration/badge.svg)
GNU General Public License v3.0
273 stars 34 forks source link

Add test for /dev/random symlink #12

Closed rawiriblundell closed 5 years ago

rawiriblundell commented 5 years ago

Hi, I emailed this through to Gary a couple of years ago (i.e. v1.9.1), I've had a quick search through the current code and it doesn't appear to have been fixed yet. Hopefully I'm wrong...

So my team had a report of a dev host using 100% of a core. We identified haveged as the cause, tested and correlated, and considered the high watermark problem and eliminated it as a possibility. Interestingly, we noticed that in strace, haveged appeared to be stuck in a select loop - which was reported behaviour of the watermark issue - but we definitely did not have the watermark issue.

We soon figured out that a developer, upon instruction from some brainiacs at Oracle, had made this little mess:

[root@redacted]/root> ls -la /dev/random
lrwxrwxrwx 1 root root 12 Jul 15 10:52 /dev/random -> /dev/urandom

Simply removing that and recreating /dev/random (mknod -m 666 /dev/random c 1 8) fixed the issue immediately. Haveged started working as expected without blowing out a CPU core. We were then able to recreate the symlink and replicate the broken behaviour.

My suggestion/vague-request to Gary was to add some kind of cursory check (e.g. lstat S_ISLNK I'm guessing?) to haveged's startup. I think the sane behaviour is for haveged to exit with an error message, rather than chew through an entire core until someone (or a monitoring system) notices.

Cheers!

jirka-h commented 5 years ago

Good catch! I have added the check as suggested. See commit 2681d01c2f44e86de901b289632e36dd5ed1dba1

Thanks Jirka