jirka-h / haveged

Entropy daemon ![Continuous Integration](https://github.com/jirka-h/haveged/workflows/Continuous%20Integration/badge.svg)
GNU General Public License v3.0
271 stars 34 forks source link

haveged core dump on Raspberry Pi with Arch Linux ARM #41

Closed haraldkoch closed 4 years ago

haraldkoch commented 4 years ago

Raspberry Pi 3 (ARM8) Running Arch Linux ARM. Linux kernel 5.4.45-1-ARCH. Arch uses init.d/fedora.service from this repo to start the daemon.

systemctl start haveged has started core dumping:

Jun 19 12:17:11 alarm.local systemd[1]: Started Entropy Daemon based on the HAVEGE algorithm. Jun 19 12:17:11 alarm.local systemd[1]: haveged.service: Main process exited, code=dumped, status=31/SYS Jun 19 12:17:11 alarm.local systemd[1]: haveged.service: Failed with result 'core-dump'. Jun 19 12:17:12 alarm.local systemd[1]: haveged.service: Scheduled restart job, restart counter is at 1.

I can make this go away by commenting out (or removing) the two SystemCallFilter lines, but I don't know which extra system calls are being invoked in a Raspberry Pi environment. Others have reported that reverting to 1.9.8 also fixes the core dumps - this was before the SystemCallFilter lines were added.

I'm happy to help debug this, but I don't know how to determine which system call is missing from the filter - if you have advice, I can run tests.

Thanks!

pheiduck commented 4 years ago

Noticed this to happend also on the latest Version

Broeckelmaier commented 4 years ago

I've also noticed the same behaviour on a x86_64 system.

vitrvvivs commented 4 years ago

Also happening on a Pi model B (rev. 1)

jirka-h commented 4 years ago

Thanks for the report!

It's running fine on my Fedora x86_64 system.

@Broeckelmaier - could you please share more details about the problem on your system? haveged version and /usr/lib/systemd/system/haveged.service file? I really would like to get a reproducer.

@haraldkoch - could you please try to add "@system-service" to the white-list?

Modified SystemCallFilter line: SystemCallFilter=@basic-io @file-system @io-event @network-io @signal @system-service

Please add also this line: SystemCallErrorNumber=EPERM

Documentation: https://www.freedesktop.org/software/systemd/man/systemd.exec.html

Thanks for your help to debug this!

@nbraud - Nicolas, could you please help to resolve this? The SystemCallFilter has been added in pull request #26 Thanks!

Broeckelmaier commented 4 years ago

I've modified the SystemCallFilter line and added the SystemCallErrorNumber on my RaspberryPi 3 B with haveged 1.9.12-1 installed. So far it runs fine again. In a few hours i'll get back to the x86_64 machine and be able to test the same

uname -a: Linux $HOSTNAME 5.4.45-1-ARCH #1 SMP PREEMPT Sun Jun 14 20:09:21 UTC 2020 armv7l GNU/Linux

jirka-h commented 4 years ago

Thanks a lot for testing it! I'm glad that it's working.

This is the verified systemd SystemCall setup:

SystemCallArchitectures=native
SystemCallFilter=@basic-io @file-system @io-event @network-io @signal @system-service
SystemCallFilter=arch_prctl brk ioctl mprotect sysinfo
SystemCallErrorNumber=EPERM

I have checked the output of

systemd-analyze syscall-filter @system-service

and

@system-service is a superset of the original setup (@basic-io @file-system @io-event @network-io @signal arch_prctl brk ioctl mprotect sysinfo) except for arch_prctl

I'm now testing this setup, as recommended here: https://www.freedesktop.org/software/systemd/man/systemd.exec.html

the following lines are a relatively safe basic choice for the majority of system services:

SystemCallArchitectures=native
SystemCallFilter=@system-service
SystemCallFilter=~@mount
SystemCallErrorNumber=EPERM

~@mount is blacklisted (note ~)

@Broeckelmaier - could you please test this simplified setup? Thanks!

I will wait for Nicholas to comment on this before committing the change. @nbraud - does the above SystemCall setup look good to you or do you think we need to restrict it more?

Thanks to everybody! Jirka

Broeckelmaier commented 4 years ago

the simplified SystemCallFilter lines work as well.

systemctl status haveged.service returns

Jun 22 12:18:29 $HOSTNAME systemd[1]: Started Entropy Daemon based on the HAVEGE algorithm.
Jun 22 12:18:30 $HOSTNAME haveged[2124]: haveged: command socket is listening at fd 3
Jun 22 12:18:38 $HOSTNAME haveged[2124]: haveged: ver: 1.9.12; arch: generic; vend: ; build: (gcc 9.3.0 CTV); collect: 128K
Jun 22 12:18:38 $HOSTNAME haveged[2124]: haveged: cpu: (VC); data: 16K (D); inst: 16K (D); idx: 11/40; sz: 14660/63044
Jun 22 12:18:38 $HOSTNAME haveged[2124]: haveged: tot tests(BA8): A:1/1 B:1/1 continuous tests(B):  last entropy estimate 8.00258
Jun 22 12:18:38 $HOSTNAME haveged[2124]: haveged: fills: 0, generated: 0
jirka-h commented 4 years ago

Thank you, @Broeckelmaier!

Let's see what Nicolas thinks about it. (Cc: @nbraud )

haraldkoch commented 4 years ago
SystemCallArchitectures=native
SystemCallFilter=@system-service
SystemCallFilter=~@mount
SystemCallErrorNumber=EPERM

~@mount is blacklisted (note ~)

This change works on my systems also. Thank you!

jirka-h commented 4 years ago

Great! Thanks for the update!

solsticedhiver commented 4 years ago

I am seeing this too on rpi0, and in the log:

juin 23 09:13:19 pan systemd-coredump[1153]: Process 1151 (haveged) of user 0 dumped core.              
                                         Stack trace of thread 1151:
                                         #0  0x0000000076f610bc uname (/usr/lib/ld-2.31.so + 0x1b0bc)
                                         #1  0x0000000076f5ea7c _dl_discover_osversion (/usr/lib/ld-2.31.so + 0x18a7c)

Also I just noticed that the cpu usage changed quite a bit after the batch updates that included haveged

https://imgur.com/4Ud4eOt

It's haveged fault, right ?

jirka-h commented 4 years ago

@solsticedhiver

Regarding the crash: Could you please try to update the systemd service file (/usr/lib/systemd/system/haveged.service) to have

SystemCallFilter=@system-service
SystemCallFilter=~@mount
SystemCallErrorNumber=EPERM

instead of

SystemCallFilter=@basic-io @file-system @io-event @network-io @signal
SystemCallFilter=arch_prctl brk ioctl mprotect sysinfo

I'm going to release a new version of haveged with systemd file updated (assuming I will get no negative comments on this change).

Regarding the CPU usage: there is no change in the code to explain the higher CPU usage. Looking at the graph you have shared, there is now a high "nice" load - it's % of CPU time occupied by user-level processes with a positive nice value.

haveged CPU usage will get higher, if there is some process which demands lots of entropy, just requesting haveged to refill the entropy pool. Could it be the case?

Thanks Jirka

solsticedhiver commented 4 years ago

I forgot to mention that after I changed the systemd .service file with the changed mentionned earlier:

SystemCallFilter=@basic-io @file-system @io-event @network-io @signal @system-service
SystemCallFilter=arch_prctl brk ioctl mprotect sysinfo
SystemCallErrorNumber=EPERM

It went back to normal

About the cpu usage, you should note also, there is a high cpu usage of "system" of about 40-50%, which does not seems normal.

But, yes it should be noted that I run play -n synth pinknoise vol -6dB, 24/7 on that pi, so it should demand random data to haveged I guess.

But I was already running that before the update of haveged to 1.9.11, so that could not explain the change

I am making the change you requested and report back as soon as possible if there is any negative effect

solsticedhiver commented 4 years ago

@haraldkoch you should not have been affected (?) because it is recommended to run rngd from rng-tools on rpi3 on archlinux-arm, as mentionned on the wiki.

It seems you can even run it on rpi0. But I am wondering about the accuracy of the wiki "Hardware Random Number Generator" paragraph though

jirka-h commented 4 years ago

@solsticedhiver - thanks for the testing!

Regarding high CPU usage - could you please check htop/top output to see what is causing high CPU load?

Jirka

solsticedhiver commented 4 years ago

it's a system (aka kernel) process called (coredump) (from systemd-coredump I guess) ....

Because haveged was crashing at the rate of 5 times per second... being restarted each time it crashed.

jirka-h commented 4 years ago

OK, now it makes sense. I'm sorry for the inconvenience.

I think systemd should not try to restart a daemon when it fails 5 times in a row.

Anyhow, I'm preparing a new release, it should be ready by tomorrow.

Thanks Jirka

jirka-h commented 4 years ago

Fixed with commit 159dcde28fa2deb3c6d5722dce9fe384f08202b7

jirka-h commented 4 years ago

Released in v1.9.13