nhorman / rng-tools

The rng-tools official repository (formerly part of the gkernel project on sourceforge)
GNU General Public License v2.0
156 stars 61 forks source link

rngd hanging when terminating #73

Closed diego-sueiro closed 4 years ago

diego-sueiro commented 4 years ago

Could reproduce the issue on versions 6.7 and 6.8. Here is the log output (journalctl -u rngd) for version 6.8 when issuing systemctl stop rngd:

Nov 11 13:14:33 localhost rngd[115]: Initializing available sources
Nov 11 13:14:33 localhost rngd[115]: Initializing entropy source hwrng
Nov 11 13:14:33 localhost rngd[115]: JITTER starts 4 threads
Nov 11 13:14:33 localhost rngd[115]: JITTER thread on cpu 0 wakes up for refill
Nov 11 13:14:33 localhost rngd[115]: JITTER thread on cpu 1 wakes up for refill
Nov 11 13:14:33 localhost rngd[115]: JITTER thread on cpu 2 wakes up for refill
Nov 11 13:14:33 localhost rngd[115]: JITTER thread on cpu 3 wakes up for refill
Nov 11 13:14:51 localhost rngd[115]: jent_read_entropy time on cpu 3 is 1.831008672600e+01 sec
Nov 11 13:14:51 localhost rngd[115]: Writing to pipe
Nov 11 13:14:51 localhost rngd[115]: DONE Writing to pipe with return 16535
Nov 11 13:14:51 localhost rngd[115]: JITTER thread on cpu 3 wakes up for refill
Nov 11 13:14:52 localhost rngd[115]: jent_read_entropy time on cpu 1 is 1.936495688200e+01 sec
Nov 11 13:14:52 localhost rngd[115]: Writing to pipe
Nov 11 13:14:52 localhost rngd[115]: DONE Writing to pipe with return 16535
Nov 11 13:14:52 localhost rngd[115]: JITTER thread on cpu 1 wakes up for refill
Nov 11 13:14:52 localhost rngd[115]: jent_read_entropy time on cpu 2 is 1.934331052800e+01 sec
Nov 11 13:14:52 localhost rngd[115]: Writing to pipe
Nov 11 13:14:52 localhost rngd[115]: DONE Writing to pipe with return 16535
Nov 11 13:14:52 localhost rngd[115]: JITTER thread on cpu 2 wakes up for refill
Nov 11 13:14:53 localhost rngd[115]: jent_read_entropy time on cpu 0 is 2.051571401800e+01 sec
Nov 11 13:14:53 localhost rngd[115]: Writing to pipe
Nov 11 13:14:53 localhost rngd[115]: DONE Writing to pipe with return 16535
Nov 11 13:14:53 localhost rngd[115]: JITTER thread on cpu 0 wakes up for refill
Nov 11 13:14:53 localhost rngd[115]: CPU Thread 0 is ready
Nov 11 13:14:53 localhost rngd[115]: CPU Thread 1 is ready
Nov 11 13:14:53 localhost rngd[115]: CPU Thread 2 is ready
Nov 11 13:14:53 localhost rngd[115]: CPU Thread 3 is ready
Nov 11 13:14:53 localhost rngd[115]: Initializing AES buffer
Nov 11 13:14:53 localhost rngd[115]: xread_jitter requests 16 bytes from pipe
Nov 11 13:14:53 localhost rngd[115]: xread_jitter gets 16 bytes
Nov 11 13:14:53 localhost rngd[115]: xread_jitter requests 128 bytes from pipe
Nov 11 13:14:53 localhost rngd[115]: xread_jitter gets 128 bytes
Nov 11 13:14:53 localhost rngd[115]: xread_jitter requests 16535 bytes from pipe
Nov 11 13:14:53 localhost rngd[115]: xread_jitter gets 16535 bytes
Nov 11 13:14:53 localhost rngd[115]: Enabling JITTER rng support
Nov 11 13:14:53 localhost rngd[115]: xread_jitter requests 4 bytes from pipe
Nov 11 13:14:53 localhost rngd[115]: xread_jitter gets 4 bytes
Nov 11 13:14:53 localhost rngd[115]: xread_jitter requests 4 bytes from pipe
Nov 11 13:14:53 localhost rngd[115]: xread_jitter gets 4 bytes
Nov 11 13:14:53 localhost rngd[115]: Initializing entropy source jitter
Nov 11 13:14:53 localhost rngd[194]: [[0;38;5;245m[[0;1;31m[[0;38;5;245mReading entropy from Hardware RNG Device[[0m
Nov 11 13:14:53 localhost systemd[1]: Started Hardware RNG Entropy Gatherer Daemon.
Nov 11 13:14:53 localhost rngd[194]: [[0;38;5;245m[[0;1;31m[[0;38;5;245mAdded 1153/4096 bits entropy[[0m
<...>
Nov 11 13:14:53 localhost rngd[194]: [[0;38;5;245m[[0;1;31m[[0;38;5;245mPool full at 4096, sleeping![[0m
Nov 11 13:28:29 localhost rngd[194]: [[0;38;5;245m[[0;1;31m[[0;38;5;245mChecking on done for thread 0[[0m
Nov 11 13:28:29 localhost systemd[1]: Stopping Hardware RNG Entropy Gatherer Daemon...
Nov 11 13:29:59 localhost systemd[1]: [[0;1;38;5;185m[[0;1;39m[[0;1;38;5;185mrngd.service: State 'stop-sigterm' timed out. Killing.[[0m
Nov 11 13:29:59 localhost systemd[1]: [[0;1;39m[[0;1;31m[[0;1;39mrngd.service: Killing process 194 (rngd) with signal SIGKILL.[[0m
Nov 11 13:29:59 localhost systemd[1]: [[0;1;38;5;185m[[0;1;39m[[0;1;38;5;185mrngd.service: Main process exited, code=killed, status=9/KILL[[0m
Nov 11 13:29:59 localhost systemd[1]: [[0;1;38;5;185m[[0;1;39m[[0;1;38;5;185mrngd.service: Failed with result 'timeout'.[[0m
Nov 11 13:29:59 localhost systemd[1]: Stopped Hardware RNG Entropy Gatherer Daemon.

System information:

Raspberry Pi 3 B+:  Linux  4.14.95

/sys/class/misc/hw_random/rng_current: bcm2835

rngd.service:
[Unit]
Description=Hardware RNG Entropy Gatherer Daemon
DefaultDependencies=no
After=systemd-udev-settle.service
Before=sysinit.target shutdown.target
Conflicts=shutdown.target
[Service]
Type=forking
ExecStart=/usr/sbin/rngd -b -d -r /dev/hwrng
[Install]
WantedBy=multi-user.target

On version 6.6 the issue is not reproducible.

nhorman commented 4 years ago

can you attach strace output of the event by any chance? After commit 62fbff0ac807e07f4f21f58c0f15bfe5e9c8b55e I was unable to reproduce the issue

nhorman commented 4 years ago

Note, you may also be hitting the issue in which the jitter library blocks while we wait to exit for a long time. It may be addressed by this test commit here: https://github.com/nhorman/rng-tools/commit/a05113e42e78ce941e5f4c4e6e8a101c50d8ab90 If you can try that, I would appreciate it, thanks

diego-sueiro commented 4 years ago

can you attach strace output of the event by any chance? After commit 62fbff0 I was unable to reproduce the issue

This is the strace output when issuing systemctl stop rngd.

poll([{fd=6, events=POLLOUT}], 1, -1)   = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=1, si_uid=0} ---
--- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=1, si_uid=0} ---
sigreturn({mask=[]})                    = -1 EINTR (Interrupted system call)
close(5)                                = 0
send(8, "<31>Nov 11 13:16:57 rngd: Checki"..., 56, MSG_NOSIGNAL) = 56
+++ killed by SIGKILL +++
nhorman commented 4 years ago

Thats not what I need, I need the strace output of rngd. You need to run strace -p pidof rngd before attempting to stop the service

diego-sueiro commented 4 years ago

Thats not what I need, I need the strace output of rngd. You need to run strace -p pidof rngd before attempting to stop the service

execve("/usr/sbin/rngd", ["/usr/sbin/rngd", "-b", "-d", "-r", "/dev/hwrng"], 0x7efd1e28 /* 5 vars */) = 0
brk(NULL)                               = 0x9e3000
uname({sysname="Linux", nodename="mbed-linux-os-6698", ...}) = 0
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x76f5f000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=7906, ...}) = 0
mmap2(NULL, 7906, PROT_READ, MAP_PRIVATE, 3, 0) = 0x76f5d000
close(3)                                = 0
openat(AT_FDCWD, "/lib/libsysfs.so.2", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\364\34\0\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=22096, ...}) = 0
mmap2(NULL, 86372, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x76f23000
mprotect(0x76f28000, 61440, PROT_NONE)  = 0
mmap2(0x76f37000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x4000) = 0x76f37000
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/libgcrypt.so.20", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\0b\0\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=667872, ...}) = 0
mmap2(NULL, 732784, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x76e70000
mprotect(0x76f0e000, 61440, PROT_NONE)  = 0
mmap2(0x76f1d000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x9d000) = 0x76f1d000
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/libjitterentropy.so.2", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\374\5\0\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=9468, ...}) = 0
mmap2(NULL, 73736, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x76e5d000
mprotect(0x76e5f000, 61440, PROT_NONE)  = 0
mmap2(0x76e6e000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1000) = 0x76e6e000
close(3)                                = 0
openat(AT_FDCWD, "/lib/libpthread.so.0", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\355]\0\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=75704, ...}) = 0
mmap2(NULL, 147656, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x76e38000
mprotect(0x76e4a000, 61440, PROT_NONE)  = 0
mmap2(0x76e59000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x11000) = 0x76e59000
mmap2(0x76e5b000, 4296, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x76e5b000
close(3)                                = 0
openat(AT_FDCWD, "/lib/libc.so.6", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\t\254\1\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=919644, ...}) = 0
mmap2(NULL, 988992, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x76d46000
mprotect(0x76e23000, 61440, PROT_NONE)  = 0
mmap2(0x76e32000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xdc000) = 0x76e32000
mmap2(0x76e36000, 5952, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x76e36000
close(3)                                = 0
openat(AT_FDCWD, "/usr/lib/libgpg-error.so.0", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0\3500\0\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=79784, ...}) = 0
mmap2(NULL, 144124, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x76d22000
mprotect(0x76d35000, 61440, PROT_NONE)  = 0
mmap2(0x76d44000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x12000) = 0x76d44000
close(3)                                = 0
openat(AT_FDCWD, "/lib/libcap.so.2", O_RDONLY|O_LARGEFILE|O_CLOEXEC) = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0(\0\1\0\0\0`\17\0\0004\0\0\0"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0644, st_size=13912, ...}) = 0
mmap2(NULL, 78140, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x76d0e000
mprotect(0x76d11000, 61440, PROT_NONE)  = 0
mmap2(0x76d20000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2000) = 0x76d20000
close(3)                                = 0
mmap2(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x76f5b000
set_tls(0x76f5b4d0)                     = 0
mprotect(0x76e32000, 8192, PROT_READ)   = 0
mprotect(0x76d20000, 4096, PROT_READ)   = 0
mprotect(0x76d44000, 4096, PROT_READ)   = 0
mprotect(0x76e59000, 4096, PROT_READ)   = 0
mprotect(0x76e6e000, 4096, PROT_READ)   = 0
mprotect(0x76f1d000, 4096, PROT_READ)   = 0
mprotect(0x76f37000, 4096, PROT_READ)   = 0
mprotect(0x26000, 4096, PROT_READ)      = 0
mprotect(0x76f61000, 4096, PROT_READ)   = 0
munmap(0x76f5d000, 7906)                = 0
set_tid_address(0x76f5b078)             = 2458
set_robust_list(0x76f5b080, 12)         = 0
rt_sigaction(SIGRTMIN, {sa_handler=0x76e3d98d, sa_mask=[], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0x76d6fd91}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {sa_handler=0x76e3da11, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART|SA_SIGINFO, sa_restorer=0x76d6fd91}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
ugetrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM_INFINITY}) = 0
brk(NULL)                               = 0x9e3000
brk(0xa04000)                           = 0xa04000
openat(AT_FDCWD, "/proc/sys/kernel/random/poolsize", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(3, "4096\n", 1024)                 = 5
close(3)                                = 0
write(2, "\nInitializing available sources\n", 32) = 32
write(2, "\n", 1)                       = 1
openat(AT_FDCWD, "/dev/hwrng", O_RDONLY|O_NOCTTY) = 3
read(3, "\274}\3749\234w\355\360dUI\356\346f%\367", 16) = 16
read(3, "\17\220\256\243", 4)           = 4
read(3, ";\377\202\370", 4)             = 4
write(2, "Initializing entropy source hwrn"..., 34) = 34
write(2, "\n", 1)                       = 1
pipe([4, 5])                            = 0
openat(AT_FDCWD, "/sys/devices/system/cpu", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_CLOEXEC|O_DIRECTORY) = 6
fstat64(6, {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
getdents64(6, /* 16 entries */, 32768)  = 464
getdents64(6, /* 0 entries */, 32768)   = 0
close(6)                                = 0
sched_getaffinity(0, 4, [0, 1, 2, 3])   = 4
fcntl64(5, F_SETPIPE_SZ, 99210)         = 131072
write(2, "JITTER starts 4 threads\n", 24) = 24
write(2, "\n", 1)                       = 1
mmap2(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x7650d000
mprotect(0x7650e000, 8388608, PROT_READ|PROT_WRITE) = 0
clone(child_stack=0x76d0cf98, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[2459], tls=0x76d0d920, child_tidptr=0x76d0d4c8) = 2459
mmap2(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x75bff000
mprotect(0x75c00000, 8388608, PROT_READ|PROT_WRITE) = 0
clone(child_stack=0x763fef98, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[2460], tls=0x763ff920, child_tidptr=0x763ff4c8) = 2460
mmap2(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x751ff000
mprotect(0x75200000, 8388608, PROT_READ|PROT_WRITE) = 0
clone(child_stack=0x759fef98, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[2461], tls=0x759ff920, child_tidptr=0x759ff4c8) = 2461
mmap2(NULL, 8392704, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0) = 0x747ff000
mprotect(0x74800000, 8388608, PROT_READ|PROT_WRITE) = 0
clone(child_stack=0x74ffef98, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tid=[2462], tls=0x74fff920, child_tidptr=0x74fff4c8) = 2462
sched_yield()                           = 0
<...>
sched_yield()                           = 0
write(2, "CPU Thread 0 is ready\n", 22) = 22
write(2, "\n", 1)                       = 1
write(2, "CPU Thread 1 is ready\n", 22) = 22
write(2, "\n", 1)                       = 1
write(2, "CPU Thread 2 is ready\n", 22) = 22
write(2, "\n", 1)                       = 1
write(2, "CPU Thread 3 is ready\n", 22) = 22
write(2, "\n", 1)                       = 1
fcntl64(4, F_GETFL)                     = 0 (flags O_RDONLY)
fcntl64(4, F_SETFL, O_RDONLY|O_CREAT|O_APPEND|O_NONBLOCK|__O_SYNC|O_NOFOLLOW|__O_TMPFILE|FASYNC|0x7e80001c) = 0
write(2, "Initializing AES buffer\n", 24) = 24
write(2, "\n", 1)                       = 1
write(2, "xread_jitter requests 16 bytes f"..., 41) = 41
write(2, "\n", 1)                       = 1
read(4, "\27\24\28k\23\204\324\261-\2044\236\260\3557", 16) = 16
write(2, "xread_jitter gets 16 bytes\n", 27) = 27
write(2, "\n", 1)                       = 1
write(2, "xread_jitter requests 128 bytes "..., 42) = 42
write(2, "\n", 1)                       = 1
read(4, "\245\207S<]\254\342s\7\23\260\201\0062\336(\243Y\227\35\264u\347\30\373\203\252\205U\366\213n"..., 128) = 128
write(2, "xread_jitter gets 128 bytes\n", 28) = 28
write(2, "\n", 1)                       = 1
access("/etc/gcrypt/fips_enabled", F_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/proc/sys/crypto/fips_enabled", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/gcrypt/hwf.deny", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/proc/self/auxv", O_RDONLY) = 6
fstat64(6, {st_mode=S_IFREG|0400, st_size=0, ...}) = 0
read(6, "!\0\0\0\0\0\354~\20\0\0\0\326\260?\0\6\0\0\0\0\20\0\0\21\0\0\0d\0\0\0"..., 1024) = 160
read(6, "", 1024)                       = 0
close(6)                                = 0
openat(AT_FDCWD, "/proc/cpuinfo", O_RDONLY) = 6
fstat64(6, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0
read(6, "processor\t: 0\nmodel name\t: ARMv7"..., 1024) = 1024
close(6)                                = 0
write(2, "xread_jitter requests 16535 byte"..., 44) = 44
write(2, "\n", 1)                       = 1
read(4, "\244\351\263nw\277\250\373R\223Z~/<Blvg\314\304'/\253\213\243<8\257\366K\351\265"..., 16535) = 16535
write(2, "xread_jitter gets 16535 bytes\n", 30) = 30
write(2, "\n", 1)                       = 1
write(2, "Enabling JITTER rng support\n", 28) = 28
write(2, "\n", 1)                       = 1
write(2, "xread_jitter requests 4 bytes fr"..., 40) = 40
write(2, "\n", 1)                       = 1
read(4, "\2655\251\206", 4)             = 4
write(2, "xread_jitter gets 4 bytes\n", 26) = 26
write(2, "\n", 1)                       = 1
write(2, "xread_jitter requests 4 bytes fr"..., 40) = 40
write(2, "\n", 1)                       = 1
read(4, "S5\210\201", 4)                = 4
write(2, "xread_jitter gets 4 bytes\n", 26) = 26
write(2, "\n", 1)                       = 1
write(2, "Initializing entropy source jitt"..., 35) = 35
write(2, "\n", 1)                       = 1
openat(AT_FDCWD, "/dev/random", O_RDWR) = 6
openat(AT_FDCWD, "/proc/sys/kernel/random/write_wakeup_threshold", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 7
fstat64(7, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
write(7, "3072\n", 5)                   = 5
close(7)                                = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x76f5b078) = 2484
exit_group(0)                           = ?
+++ exited with 0 +++
nhorman commented 4 years ago

yeah, I should have asked you to add the -f flag to strace to watch the threads close down, but the lack of calls to pthread_join at the end suggests that the problem is that rngd is stuck in jitterentropy, waiting for generation to complete during exit. Please try building the branch at: https://github.com/nhorman/rng-tools/tree/early-jitter-exit

I expect that will fix your problem. I haven't merged it yet for lack of testing, but if you can confirm it fixes your problem, I'll merge it asap

diego-sueiro commented 4 years ago

yeah, I should have asked you to add the -f flag to strace to watch the threads close down, but the lack of calls to pthread_join at the end suggests that the problem is that rngd is stuck in jitterentropy, waiting for generation to complete during exit. Please try building the branch at: https://github.com/nhorman/rng-tools/tree/early-jitter-exit

I expect that will fix your problem. I haven't merged it yet for lack of testing, but if you can confirm it fixes your problem, I'll merge it asap

I'm still seeing this issue. When starting rngd with strace -f the systemd start times out.

nhorman commented 4 years ago

with the branch I referred you to? Then please attach the output of strace -f on rngd when the issue occurs

diego-sueiro commented 4 years ago

with the branch I referred you to? Then please attach the output of strace -f on rngd when the issue occurs

The branch you asked me to test.

This is the strace -f log when stopping rngd.service:

2987  poll([{fd=6, events=POLLOUT}], 1, -1) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
2987  --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=1, si_uid=0} ---
2987  --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=1, si_uid=0} ---
2987  sigreturn({mask=[]})              = -1 EINTR (Interrupted system call)
2987  close(5)                          = 0
2987  send(8, "<31>Nov 11 13:57:58 rngd: Checki"..., 56, MSG_NOSIGNAL) = 56
2987  sched_yield()                     = 0
<...>
2987  +++ killed by SIGKILL +++
nhorman commented 4 years ago

I don't understand whats going on here. I can't reproduce this on my raspberry pi, and looking at your strace above, I see no activity from the child threads that the jitterentropy source creates, which sugests they aren't running at all (which is the source of the previous hang). Can you run the following commands in a terminal and post the output: rngd -v; rngd -f -d

Note, that after you start rngd in the foreground, you can attempt to stop it in the same way that systemd does by executing kill -SIGHUP pidof rndg from another shell

Also, one other thing I noticed, your service file is a forking type which start rngd to run in the background. While thats not a problem per-se, is there any reason you don't just list this as a simple unit, which starts rngd in the foreground (-f)

diego-sueiro commented 4 years ago

I don't understand whats going on here. I can't reproduce this on my raspberry pi, and looking at your strace above, I see no activity from the child threads that the jitterentropy source creates, which sugests they aren't running at all (which is the source of the previous hang). Can you run the following commands in a terminal and post the output: rngd -v; rngd -f -d

Note, that after you start rngd in the foreground, you can attempt to stop it in the same way that systemd does by executing kill -SIGHUP pidof rndg from another shell

Launching it as foreground works, but if I start it as a daemon it hangs.

Also, one other thing I noticed, your service file is a forking type which start rngd to run in the background. While thats not a problem per-se, is there any reason you don't just list this as a simple unit, which starts rngd in the foreground (-f)

The reason why I use the type forking and launch it as a daemon is because during the system startup I have other process that use the /dev/random device.

nhorman commented 4 years ago

Launching it as foreground works, but if I start it as a daemon it hangs.

If thats the case then I need you to run a few extra tests: 1) run rngd -v and paste the output here 2) run 'rngd -f -d', then from another terminal window run kill -SIGHUP pidof rngd. Capture the rngd terminal output and paste it here 3) run 'rngd -d', then from another terminal window run kill -SIGHUP `pidof rngd'. then run journalctl and capture the resultant ouput and paste it here.

Note all tests should use the branch of rngd that I noted earlier

The reason why I use the type forking and launch it as a daemon is because during the system >startup I have other process that use the /dev/random device.

Thats not how forking units work. The unit type only controls how systemd tracks and handles the units process. There is no reason that as a non-forking process, other processes can't use /dev/random in parallel. If this works as you assert as a foreground process, you should be able to work around this issue by altering your unit file such that the service Type=simple, and setting ExecStart=/sbin/rngd -f

diego-sueiro commented 4 years ago

Launching it as foreground works, but if I start it as a daemon it hangs.

If thats the case then I need you to run a few extra tests:

  1. run rngd -v and paste the output here

rngd 6.7

  1. run 'rngd -f -d', then from another terminal window run kill -SIGHUP pidof rngd. Capture the rngd terminal output and paste it here
Initializing available sources
Initializing entropy source hwrng
JITTER starts 4 threads
cpu 0 registers its setjmp handler
JITTER thread on cpu 0 wakes up for refill
cpu 1 registers its setjmp handler
JITTER thread on cpu 1 wakes up for refill
cpu 2 registers its setjmp handler
JITTER thread on cpu 2 wakes up for refill
cpu 3 registers its setjmp handler
JITTER thread on cpu 3 wakes up for refill
jent_read_entropy time on cpu 3 is 1.743779968100e+01 sec
Writing to pipe
DONE Writing to pipe with return 16535
JITTER thread on cpu 3 wakes up for refill
jent_read_entropy time on cpu 2 is 1.817187181700e+01 sec
Writing to pipe
DONE Writing to pipe with return 16535
JITTER thread on cpu 2 wakes up for refill
jent_read_entropy time on cpu 0 is 1.823145542300e+01 sec
Writing to pipe
DONE Writing to pipe with return 16535
JITTER thread on cpu 0 wakes up for refill
CPU Thread 0 is ready
jent_read_entropy time on cpu 1 is 1.965711770200e+01 sec
Writing to pipe
DONE Writing to pipe with return 16535
JITTER thread on cpu 1 wakes up for refill
CPU Thread 1 is ready
CPU Thread 2 is ready
CPU Thread 3 is ready
Initializing AES buffer
xread_jitter requests 16 bytes from pipe
xread_jitter gets 16 bytes
xread_jitter requests 128 bytes from pipe
xread_jitter gets 128 bytes
xread_jitter requests 16535 bytes from pipe
xread_jitter gets 16535 bytes
Enabling JITTER rng support
xread_jitter requests 4 bytes from pipe
xread_jitter gets 4 bytes
xread_jitter requests 4 bytes from pipe
xread_jitter gets 4 bytes
Initializing entropy source jitter
Reading entropy from Hardware RNG Device
Added 3477/4096 bits entropy
Added 3535/4096 bits entropy
Added 3587/4096 bits entropy
Added 3635/4096 bits entropy
Added 3678/4096 bits entropy
Added 3717/4096 bits entropy
Added 3753/4096 bits entropy
Added 3785/4096 bits entropy
Added 3814/4096 bits entropy
Added 3840/4096 bits entropy
Added 3864/4096 bits entropy
Added 3886/4096 bits entropy
Added 3905/4096 bits entropy
Added 3923/4096 bits entropy
Added 3939/4096 bits entropy
Added 3954/4096 bits entropy
Added 3967/4096 bits entropy
Added 3979/4096 bits entropy
Added 3990/4096 bits entropy
Added 4000/4096 bits entropy
Added 4009/4096 bits entropy
Added 4017/4096 bits entropy
Added 4024/4096 bits entropy
Added 4031/4096 bits entropy
Added 4037/4096 bits entropy
Pool full at 4096, sleeping!
Hangup
  1. run 'rngd -d', then from another terminal window run kill -SIGHUP `pidof rngd'. then run journalctl and capture the resultant output and paste it here.

no output from journalctl

Initializing available sources
Initializing entropy source hwrng
JITTER starts 4 threads
cpu 0 registers its setjmp handler
JITTER thread on cpu 0 wakes up for refill
cpu 1 registers its setjmp handler
JITTER thread on cpu 1 wakes up for refill
cpu 2 registers its setjmp handler
JITTER thread on cpu 2 wakes up for refill
cpu 3 registers its setjmp handler
JITTER thread on cpu 3 wakes up for refill
jent_read_entropy time on cpu 1 is 1.762344390000e+01 sec
Writing to pipe
DONE Writing to pipe with return 16535
JITTER thread on cpu 1 wakes up for refill
jent_read_entropy time on cpu 0 is 1.852487124400e+01 sec
Writing to pipe
DONE Writing to pipe with return 16535
JITTER thread on cpu 0 wakes up for refill
CPU Thread 0 is ready
CPU Thread 1 is ready
jent_read_entropy time on cpu 3 is 1.865987702400e+01 sec
Writing to pipe
DONE Writing to pipe with return 16535
JITTER thread on cpu 3 wakes up for refill
jent_read_entropy time on cpu 2 is 1.801052188200e+01 sec
Writing to pipe
DONE Writing to pipe with return 16535
JITTER thread on cpu 2 wakes up for refill
CPU Thread 2 is ready
CPU Thread 3 is ready
Initializing AES buffer
xread_jitter requests 16 bytes from pipe
xread_jitter gets 16 bytes
xread_jitter requests 128 bytes from pipe
xread_jitter gets 128 bytes
xread_jitter requests 16535 bytes from pipe
xread_jitter gets 16535 bytes
Enabling JITTER rng support
xread_jitter requests 4 bytes from pipe
xread_jitter gets 4 bytes
xread_jitter requests 4 bytes from pipe
xread_jitter gets 4 bytes
Initializing entropy source jitter

Note all tests should use the branch of rngd that I noted earlier

The reason why I use the type forking and launch it as a daemon is because during the system >startup I have other process that use the /dev/random device.

Thats not how forking units work. The unit type only controls how systemd tracks and handles the units process. There is no reason that as a non-forking process, other processes can't use /dev/random in parallel. If this works as you assert as a foreground process, you should be able to work around this issue by altering your unit file such that the service Type=simple, and setting ExecStart=/sbin/rngd -f

I had this problem almost a year ago, and as far as I remember I had problems with connman/wpa_supplicant starting/working because the /dev/random didn't have enough entropy.

Can you reproduce the issue by launching it as daemon and check if it terminates with SIGHUP?

nhorman commented 4 years ago

I just found the SIGHUP issue when running as a daemon, it appears to be a hold over from before I rewrote rng-tools years ago (when sysvinit used a different signal to shutdown processes). I'm going to update the early-jitter-exit branch shortly and have you retest

Regarding you're previous experience with forking vs non-forking services: While you can use either, I think you've confused cause and coincidence. a year ago rngd on arm didn't have access to the jitter entropy source, and so likely blocked alot longer generating random data. Moving to a forking unit likely just changed some timing that you were able to generate some extra entropy. Either way, you can use forking if you like, but simple units are generally easier to manage.

nhorman commented 4 years ago

Ok, I updated the early-jitter-exit branch, but I just realized something. When you run a process as a daemon (as rngd does when -b is specified), the SIGHUP signal is automatically ignored (as it has to be according to http://www.masterraghu.com/subjects/np/introduction/unix_network_programming_v1.3/ch13lev1sec4.html). Because you have written a systemd unit file to run rngd as a daemon, we're automatically ignoring the signal used systemd to shut the process down. What I think you need to do in addition to the above changes is modify your unit file such that the [service] section contains a KillSignal=SIGINT directive. This will make systemd send a SIGINT rather than a SIGHUP, which rngd can catch to gracefully shut down

diego-sueiro commented 4 years ago

Ok, I updated the early-jitter-exit branch, but I just realized something. When you run a process as a daemon (as rngd does when -b is specified), the SIGHUP signal is automatically ignored (as it has to be according to http://www.masterraghu.com/subjects/np/introduction/unix_network_programming_v1.3/ch13lev1sec4.html). Because you have written a systemd unit file to run rngd as a daemon, we're automatically ignoring the signal used systemd to shut the process down. What I think you need to do in addition to the above changes is modify your unit file such that the [service] section contains a KillSignal=SIGINT directive. This will make systemd send a SIGINT rather than a SIGHUP, which rngd can catch to gracefully shut down

I thought systemd sends the SIGTERM by default, and from the logs I sent earlier I have: rngd.service: State 'stop-sigterm' timed out..

I'm still seeing the issue when launching as daemon (I'm not even using systemd at this point). After the process forked to the background it is not terminating. But if I send the SIGINT/SIGHUP/SIGTERM before it forks then it terminates.

nhorman commented 4 years ago

doh! you're right, I'm not sure why I was thinking SIGHUP. I think I was thinking of the reconfig directive. Though that doesn't make sense, SIGTERM is clearly being caught in the code, and on my system I can see it getting handled, so something else must be going on here

nhorman commented 4 years ago

Oh! I'm an idiot! I see the problem. rngd daemonizes too late in main, and does so after its initialized all the entropy sources, which means jitter has already created all its threads. When we daemonize, we call fork twice, causing all the threads there to exit early. so when we catch a shutdown signal, we signal all the child threads with SIGUSR1, then wait for them to complete, but they won't ever, because they've already exited, leading to your hang. The solution is to daemonize prior to any initialization.

I've pushed the fix to the early-jitter-exit branch (force push, as I squashed some changes, sorry, you'll need to delete your local branch and repull). This should fix your issue though. Please give it a test and confirm that it solves the problem for you. I'll merge it as soon as I hear back

diego-sueiro commented 4 years ago

@nhorman,

Don't blame yourself too hard. That is fine. I'm glad that you helped me to track down this bug and fix it quickly.

I tested terminating rngd both in foreground and background mode and everything is working as expected. And even better, is that in daemon mode it goes to background faster which improved our boot time by ~20 seconds.

nhorman commented 4 years ago

awesome, thanks, I'll merge this today than.

diego-sueiro commented 4 years ago

That is great. Thanks.

nhorman commented 4 years ago

fixed in https://github.com/nhorman/rng-tools/pull/74