Hi,
I emailed this through to Gary a couple of years ago (i.e. v1.9.1), I've had a quick search through the current code and it doesn't appear to have been fixed yet. Hopefully I'm wrong...
So my team had a report of a dev host using 100% of a core. We identified haveged as the cause, tested and correlated, and considered the high watermark problem and eliminated it as a possibility. Interestingly, we noticed that in strace, haveged appeared to be stuck in a select loop - which was reported behaviour of the watermark issue - but we definitely did not have the watermark issue.
We soon figured out that a developer, upon instruction from some brainiacs at Oracle, had made this little mess:
Simply removing that and recreating /dev/random (mknod -m 666 /dev/random c 1 8) fixed the issue immediately. Haveged started working as expected without blowing out a CPU core. We were then able to recreate the symlink and replicate the broken behaviour.
My suggestion/vague-request to Gary was to add some kind of cursory check (e.g. lstat S_ISLNK I'm guessing?) to haveged's startup. I think the sane behaviour is for haveged to exit with an error message, rather than chew through an entire core until someone (or a monitoring system) notices.
Hi, I emailed this through to Gary a couple of years ago (i.e. v1.9.1), I've had a quick search through the current code and it doesn't appear to have been fixed yet. Hopefully I'm wrong...
So my team had a report of a dev host using 100% of a core. We identified haveged as the cause, tested and correlated, and considered the high watermark problem and eliminated it as a possibility. Interestingly, we noticed that in strace, haveged appeared to be stuck in a select loop - which was reported behaviour of the watermark issue - but we definitely did not have the watermark issue.
We soon figured out that a developer, upon instruction from some brainiacs at Oracle, had made this little mess:
Simply removing that and recreating /dev/random (mknod -m 666 /dev/random c 1 8) fixed the issue immediately. Haveged started working as expected without blowing out a CPU core. We were then able to recreate the symlink and replicate the broken behaviour.
My suggestion/vague-request to Gary was to add some kind of cursory check (e.g. lstat S_ISLNK I'm guessing?) to haveged's startup. I think the sane behaviour is for haveged to exit with an error message, rather than chew through an entire core until someone (or a monitoring system) notices.
Cheers!