jirka-h / haveged

Entropy daemon ![Continuous Integration](https://github.com/jirka-h/haveged/workflows/Continuous%20Integration/badge.svg)
GNU General Public License v3.0
273 stars 34 forks source link

is haveged still useful/relevant? #57

Closed klausenbusk closed 2 years ago

klausenbusk commented 3 years ago

Hi

Sorry for the harsh question, but with the jitter entropy added in v5.4 (merge commit, commit, LKML) and removal of the /dev/random blocking pool in v5.6 (commit, LWN), is haveged still useful/relevant? If yes, under what circumstances?

- Kristian

jirka-h commented 3 years ago

Hi Kristian,

thanks a lot for pointing me to the recent kernel development. 

After reading the LKML/LWN articles, I completely agree that the haveged service is now obsolete (starting from kernel 5.6). I have verified it experimentally on Fedora 32, running kernel 5.10

 $time timeout 30 pv /dev/random > /dev/null
1.18GiB 0:00:29 [42.2MiB/s]
real    0m30.012s
user    0m0.022s
sys     0m29.934s

1) There is no difference in throughput with and without haveged service running 2) haveged service is not triggered at all as verified with strace. No entropy is sent to the kernel.

I'm happy that these changes made it into the mainline kernel. It's nice to see that the main idea behind HAVEGED has sustained time test! (It was published already in 2003 here: https://www.irisa.fr/caps/projects/hipsor/publications/havege-tomacs.pdf)

I'm also glad that the HAVEGE algorithm is being further explored and examined - see the "CPU Jitter Random Number Generator" page at https://www.chronox.de/jent.html

I will keep maintaining HAVEGED - most Linux installations are still running on the older kernel versions. HAVEGED can also be used as the userspace RNG to generate random numbers. See man -S8 haveged for examples or try running haveged -n 0 | pv > /dev/null

Last but not least, HAVEGED can be used as the RNG library.

Thanks a lot Jirka

Further references: https://lore.kernel.org/lkml/alpine.DEB.2.21.1909290010500.2636@nanos.tec.linutronix.de/T/ https://lwn.net/Articles/808575/ https://www.phoronix.com/scan.php?page=news_item&px=Linux-5.6-Random-Rework https://en.wikipedia.org/wiki//dev/random https://www.irisa.fr/caps/projects/hipsor/publications/havege-tomacs.pdf https://www.chronox.de/jent.html https://www.chronox.de/jent/doc/CPU-Jitter-NPTRNG.pdf https://github.com/sandy-harris/maxwell

eworm-de commented 3 years ago

Relevant info has been added in 297bdf1fc52fc6f59d0495f911d4e594b4d29190, also cef1d425b5431847b8c9ab5b00c3e6b82a32b4f2 adds a condition on kernel version in service file and makes it a no-op for linux >= 5.6.

frostschutz commented 2 years ago

So now with haveged deemed "obsolete" done and getting dropped all over the place because it doesn't do anything, how does your average joe linux hobbyist admin deal with crng init taking ages? That's still happening in recent kernels... and was the prime reason I installed haveged pretty much everywhere, only it's a no-op now.

Bare metal (archlinux kernel 5.15):

dmesg | grep random
[    0.287693] random: get_random_u64 called from __kmem_cache_create+0x2a/0x540 with crng_init=0
[    6.158634] random: fast init done
[  100.703821] random: crng init done

In a KVM virtual machine (archlinux kernel 5.10):

[    0.063991] random: get_random_u64 called from __kmem_cache_create+0x2a/0x4c0 with crng_init=0
[    1.558234] random: fast init done
[    3.958456] random: cryptsetup: uninitialized urandom read (4 bytes read)
[   36.354001] random: cryptsetup: uninitialized urandom read (32 bytes read)
[   41.265375] random: cryptsetup: uninitialized urandom read (64 bytes read)
[   41.265477] random: cryptsetup: uninitialized urandom read (64 bytes read)
[   41.265481] random: cryptsetup: uninitialized urandom read (64 bytes read)
[   50.726303] random: crng init done
[   50.726321] random: 5 urandom warning(s) missed due to ratelimiting

Same virtual machine with manually tickling the random device early in initramfs:

dmesg | grep random
[    0.046587] random: get_random_u64 called from __kmem_cache_create+0x2a/0x4c0 with crng_init=0
[    1.377856] random: fast init done
[    2.696946] random: crng init done

So massaging the random device early still has an influence, and I'd love to continue using haveged for the job, but for that to work it would actually have to do something, like unconditionally feed some randomness on startup, either as a once-off or periodically... as far as random sources go, it's the more the merrier, isn't it?

jirka-h commented 2 years ago

Hi Andreas,

I have updated haveged to feed entropy to the kernel on start and then every 60 seconds. Please give it a try and let me know if it works for you. See commits b0d1b0e82602401c51404b941133b993b8aa65e9 and c35c6f44aa01d0f6ddf2752e04b5ef763f4c61a2

I have tested it on x86_64 running kernel 5.15 (./haveged --Foreground) and it works fine there.

Thanks a lot Jirka

eworm-de commented 2 years ago

Now does it make sense to revert cef1d425b5431847b8c9ab5b00c3e6b82a32b4f2?

frostschutz commented 2 years ago

It seems to work fine for me.

Without haveged:

[    0.074495] random: get_random_u64 called from __kmem_cache_create+0x2a/0x4c0 with crng_init=0
[    1.353786] random: fast init done
[    3.428981] random: cryptsetup: uninitialized urandom read (4 bytes read)
[   27.874776] random: cryptsetup: uninitialized urandom read (32 bytes read)
[   32.867232] random: cryptsetup: uninitialized urandom read (64 bytes read)
[   32.867418] random: cryptsetup: uninitialized urandom read (64 bytes read)
[   32.867425] random: cryptsetup: uninitialized urandom read (64 bytes read)
[   42.252014] random: crng init done
[   42.252027] random: 5 urandom warning(s) missed due to ratelimiting

With haveged:

[    0.066190] random: get_random_u64 called from __kmem_cache_create+0x2a/0x4c0 with crng_init=0
[    1.476520] random: crng init done

(random fast init message is mysteriously missing. haveged itself spawns around the [ 0.8xxx] ~ [ 0.9xxx] mark.)

Using a simple initcpio hook (archlinux specific):

/etc/initcpio/install/haveged

#!/bin/bash

build() {
    add_binary "haveged"
    add_runscript
}

help() {
    cat <<HELPEOF
Haveged for early randomness and fast crng initialization.
HELPEOF
}

/etc/initcpio/hooks/haveged

#!/usr/bin/ash

run_earlyhook() {
    haveged
}

run_cleanuphook() {
    killall haveged
}

So nothing fancy, it simply starts haveged early initramfs and kills it late initramfs. haveged works its magic in the meantime. haveged as a service to be spawned again later on by the real init system.

eworm-de commented 2 years ago

(random fast init message is mysteriously missing. haveged itself spawns around the [ 0.8xxx] ~ [ 0.9xxx] mark.)

It is not mysteriously. As enough entropy is available fast init is skipped completely in favor of complete crng init.

Using a simple initcpio hook (archlinux specific):

[snipped initcpio hooks]

So nothing fancy, it simply starts haveged early initramfs and kills it late initramfs. haveged works its magic in the meantime.

I could add something like this in the package.... But wondering if it makes sense to add a new switch --once or --early. It could inject entropy once, then terminate itself.

haveged as a service to be spawned again later on by the real init system.

With the switch from above it would be possible to keep the current service haveged.service as is, including the condition on kernel version. Adding a new service haveged-once.service using the new switch would allow to add a service in systemd-enabled initramfs image.

klausenbusk commented 2 years ago

I'm a bit puzzled why the jitter entropy in the kernel (merge commit, commit, LKML) isn't working. Is there anything exotic about your setup @frostschutz?

Edit: Is it blocking boot @frostschutz?

jirka-h commented 2 years ago

it would be possible to keep the current service haveged.service as is, including the condition on kernel version.

:+1: I like it!

I have added a new switch --once and I have kept haveged.service unchanged. Could you please give it a try? If it works fine, I will release a new version.

Thanks a lot! Jirka

jirka-h commented 2 years ago

commit 98ead65f953a3431d53c5837eedd008100ce9ed7

frostschutz commented 2 years ago

Is there anything exotic about your setup @frostschutz?

My desktop is a standard arch linux install with a custom encryption hook since I have more than just the one LUKS device. But in the end it still runs a standard cryptsetup open ... and waits for me to enter passphrase. And that works fine, except crng init simply never happens until there is actually activity from my end, so the crng init after 100 seconds on bare metal is because I wasn't typing anything.

My virtual server is a little exotic in that its encrypted and uses cryptsetup very early to check or change the passphrase. So it actually wants to use the random device very early, hence the warnings about cryptsetup reading random before crng is fully initialized.

It is not blocking in either case, so maybe this is for cosmetics only, it just doesn't give me a good feeling to see such warning messages or late initializations.

The kernel is very conservative about randomness/entropy (for good reasons, I'm sure) but even if the kernel 'fixes' it, I still want to keep using haveged... I also have other things in place like an early random seed (the systemd random seed service runs long after initramfs is done so maybe a little late) but I did not use those in the above tests.

Basically however good the kernels random implementation is or will be, I still feel that userspace should mess with it just a little regardless, and for that purpose I'd love to continue using haveged, both in initramfs as well as a service that just keeps running indefinitely.

So this is my personal feeling but I'd still love the kernel 5.6 condition to go away, after all I installed the thing and enabled the service because I want it to run and do something. ;-) I'm sure there will be people who don't need it but they simply won't install or activate it either?

I can patch the service file locally or just install my own, but I can't patch haveged itself, so @jirka-h thanks a lot for doing that especially between the years — I really appreciate it.

eworm-de commented 2 years ago

@frostschutz, do you have initramfs with or without systemd?

frostschutz commented 2 years ago

I don't use systemd in initramfs yet. Traditional busybox-based initcpio for me. Otherwise, the hook I posted above also would not work.

eworm-de commented 2 years ago

Ah, got it. Missed it was your post. 🙈

eworm-de commented 2 years ago

Could you please give it a try? If it works fine, I will release a new version.

Would be nice to have haveged-once.service included in the release...

eworm-de commented 2 years ago

Ok, this is my log now... Does not look successful, though haveged has been started. Did it feed anything at all?

Dez 31 16:41:38 archlinux kernel: random: get_random_u64 called from __kmem_cache_create+0x2a/0x540 with crng_init=0
Dez 31 16:41:38 archlinux systemd[1]: Initializing machine ID from random generator.
Dez 31 16:41:38 archlinux haveged[148]: haveged starting up
Dez 31 16:41:38 archlinux haveged[148]: haveged: command socket is listening at fd 3
Dez 31 16:41:38 archlinux haveged[160]: haveged: ver: 1.9.16; arch: x86; vend: GenuineIntel; build: (gcc 11.1.0 ITV); collect: 128K
Dez 31 16:41:38 archlinux haveged[160]: haveged: cpu: (L4 VC); data: 32K (L4 V); inst: 32K (L4 V); idx: 23/40; sz: 31288/55167
Dez 31 16:41:38 archlinux haveged[160]: haveged: tot tests(BA8): A:1/1 B:1/1 continuous tests(B):  last entropy estimate 7.99948
Dez 31 16:41:38 archlinux haveged[160]: haveged: fills: 0, generated: 0
Dez 31 16:41:38 archlinux haveged[160]: haveged: Stopping due to signal 15
Dez 31 16:41:38 archlinux systemd[1]: haveged-once.service: Deactivated successfully.
Dez 31 16:41:39 archlinux kernel: random: fast init done
[...]

Used this for haveged-once.service (derived from contrib/Fedora/haveged.service):

[Unit]
Description=Entropy Daemon based on the HAVEGE algorithm
Documentation=man:haveged(8) http://www.issihosts.com/haveged/
DefaultDependencies=no

[Service]
Type=oneshot
ExecStart=@SBIN_DIR@/haveged -w 1024 -v 1 --once
SuccessExitStatus=137 143

SecureBits=noroot-locked
CapabilityBoundingSet=CAP_SYS_ADMIN CAP_SYS_CHROOT
# We can *not* set PrivateTmp=true as it can cause an ordering cycle.
PrivateTmp=false
PrivateDevices=true
# We can *not* set PrivateNetwork=true to allow command mode (chroot when included in initramfs)
#PrivateNetwork=true
ProtectSystem=full
ProtectHome=true
ProtectHostname=true
ProtectKernelLogs=true
ProtectKernelModules=true
RestrictNamespaces=true
RestrictRealtime=true

LockPersonality=true
MemoryDenyWriteExecute=true
SystemCallArchitectures=native
SystemCallFilter=@system-service
SystemCallFilter=~@mount
SystemCallErrorNumber=EPERM

Possibly we could drop even more settings for initramfs... Not sure.

eworm-de commented 2 years ago

BTW, systemd reports:

systemd[1]: /usr/lib/systemd/system/haveged.service:32: Failed to parse system call, ignoring: newuname

... so I dropped it from my service.

eworm-de commented 2 years ago

@frostschutz, can you please test this package? haveged-1.9.15-2

Should contain everything needed...

jirka-h commented 2 years ago

Thanks a lot for the testing and packaging!

I have done couple of changes:

  1. Improved logs:
    ROOT$./haveged -w 1024 -v 1 --Foreground --once
    haveged: command socket is listening at fd 3
    haveged starting up
    haveged: ver: 1.9.16; arch: x86; vend: GenuineIntel; build: (gcc 11.2.1 ITV); collect: 128K
    haveged: cpu: (L4 VC); data: 32K (L4 V); inst: 32K (L4 V); idx: 24/40; sz: 32010/53875
    haveged: tot tests(BA8): A:1/1 B:1/1 continuous tests(B):  last entropy estimate 7.99914
    haveged: fills: 0, generated: 0 
    haveged: Entropy refilled once (2048 bytes), exiting.
    tot tests(BA8): A:1/1 B:1/1 continuous tests(B):  last entropy estimate 7.99914
    fills: 1, generated: 512 K bytes, RNDADDENTROPY: 2 K bytes
  2. I have added contrib/Fedora/haveged-once.service to GIT repo (1f6a41a112dc3a52792f8d981f0812c7bed0d5db) I took your version, but I have added --Foreground

Could you please test the latest version?

Thanks a lot! Jirka

eworm-de commented 2 years ago

Thanks for the changes!

Away from keyboard till next year. 😳😆 I will test tomorrow.

frostschutz commented 2 years ago

@frostschutz, can you please test this package? haveged-1.9.15-2

Tested it and the (non-systemd) hook works for me, haven't tested any of the systemd stuff though.

eworm-de commented 2 years ago

Tested it and the (non-systemd) hook works for me, haven't tested any of the systemd stuff though.

Thanks a lot! (I do test the other part.)

eworm-de commented 2 years ago

This is with current git master (1f6a41a112dc3a52792f8d981f0812c7bed0d5db):

Jan 01 22:30:38 archlinux kernel: random: get_random_u64 called from __kmem_cache_create+0x2a/0x540 with crng_init=0
Jan 01 22:30:38 archlinux systemd[1]: Initializing machine ID from random generator.
Jan 01 22:30:38 archlinux haveged[147]: haveged: command socket is listening at fd 3
Jan 01 22:30:38 archlinux haveged[147]: haveged: ver: 1.9.16; arch: x86; vend: GenuineIntel; build: (gcc 11.1.0 ITV); collect: 128K
Jan 01 22:30:38 archlinux haveged[147]: haveged: cpu: (L4 VC); data: 32K (L4 V); inst: 32K (L4 V); idx: 23/40; sz: 31288/55167
Jan 01 22:30:38 archlinux haveged[147]: haveged: tot tests(BA8): A:1/1 B:1/1 continuous tests(B):  last entropy estimate 8.00103
Jan 01 22:30:38 archlinux haveged[147]: haveged: fills: 0, generated: 0
Jan 01 22:30:38 archlinux haveged[147]: haveged: Entropy refilled once (2048 bytes), exiting.
Jan 01 22:30:38 archlinux haveged[147]: tot tests(BA8): A:1/1 B:1/1 continuous tests(B):  last entropy estimate 8.00103
Jan 01 22:30:38 archlinux haveged[147]: fills: 1, generated: 512 K bytes, RNDADDENTROPY: 2 K bytes
Jan 01 22:30:38 archlinux haveged[147]: haveged starting up
Jan 01 22:30:38 archlinux systemd[1]: haveged-once.service: Main process exited, code=exited, status=1/FAILURE
Jan 01 22:30:38 archlinux systemd[1]: haveged-once.service: Failed with result 'exit-code'.
Jan 01 22:30:38 archlinux kernel: random: crng init done

It does work, but haveged returns with exit code indicating error.

I think I did not notice before because I dropped --Foreground from my service file. Perhaps --once should not fork at all.

eworm-de commented 2 years ago

Oh, and the man page is missing --once.

jirka-h commented 2 years ago

Thanks for the testing - good catch!

I have fixed the exit status when using --once and updated the man page. See 9e4a1f53cfbb2a7aa1f534861e6f03587e5d6f16

Could you please verify the fix?

eworm-de commented 2 years ago

Looks good now, thanks!

eworm-de commented 2 years ago

This think this could be used in haveged-dracut.module now, which would allow to drop haveged-switch-root.service.

klausenbusk commented 2 years ago

My desktop is a standard arch linux install with a custom encryption hook since I have more than just the one LUKS device. But in the end it still runs a standard cryptsetup open ... and waits for me to enter passphrase. And that works fine, except crng init simply never happens until there is actually activity from my end, so the crng init after 100 seconds on bare metal is because I wasn't typing anything.

The kernel's jitter entropy is only running if needed, so the crng initializing very late in this case is expected:

/*
 * Wait for the urandom pool to be seeded and thus guaranteed to supply
 * cryptographically secure random numbers. This applies to: the /dev/urandom
 * device, the get_random_bytes function, and the get_random_{u32,u64,int,long}
 * family of functions. Using any of these functions without first calling
 * this function forfeits the guarantee of security.
 *
 * Returns: 0 if the urandom pool has been seeded.
 *          -ERESTARTSYS if the function was interrupted by a signal.
 */
int wait_for_random_bytes(void)

https://github.com/torvalds/linux/blob/859431ac11aef9b4cd7ffa75e94a92a6a41c8623/drivers/char/random.c#L1605-L1614

My virtual server is a little exotic in that its encrypted and uses cryptsetup very early to check or change the passphrase. So it actually wants to use the random device very early, hence the warnings about cryptsetup reading random before crng is fully initialized.

That is expected, as cryptsetup on arch uses /dev/urandom by default, which won't trigger the kernel's jitter entropy. If you switch to /dev/random (--use-random) it will trigger the kernel's jitter entropy and block until the crng is fully initialized (bascially the only difference between random and urandom these days: commit, LWN) and I think it is a more sane choice for your use-case.

It is not blocking in either case, so maybe this is for cosmetics only, it just doesn't give me a good feeling to see such warning messages or late initializations.

It sounds like mostly a cosmetic thing to me :) The warning caused by cryptsetup should indeed be fixed (ex: by using /dev/random) or you could trigger the kernel's jitter entropy from a initramfs script (ex: head -c16 /dev/random > /dev/null).

Basically however good the kernels random implementation is or will be, I still feel that userspace should mess with it just a little regardless, and for that purpose I'd love to continue using haveged, both in initramfs as well as a service that just keeps running indefinitely.

So this is my personal feeling but I'd still love the kernel 5.6 condition to go away, after all I installed the thing and enabled the service because I want it to run and do something. ;-) I'm sure there will be people who don't need it but they simply won't install or activate it either?

Make sense, I just don't like users installing haveged unnecessary because they read some 10 years old guide, so IMO haveged still isn't needed on Linux =>5.6 for the average user.

jirka-h commented 2 years ago

Make sense, I just don't like users installing haveged unnecessary because they read some 10 years old guide, so IMO haveged still isn't needed on Linux =>5.6 for the average user.

:+1: I completely agree!

jirka-h commented 2 years ago

Thanks for the testing! I have finalized the changes and released v1.9.16

https://github.com/jirka-h/haveged/releases/tag/v1.9.16

frostschutz commented 2 years ago

The kernel's jitter entropy is only running if needed

uses /dev/urandom by default, which won't trigger the kernel's jitter entropy.

bascially the only difference between random and urandom these days

That's very unfortunate.

It means throwing nearly 10 years worth of re-education out the window ( Just use /dev/urandom! ).

I really don't want to use /dev/random anymore. Documentation states it blocks and has indeterminate delays, which is unacceptable when early booting (you want it to boot, not hang indefinitely). It also states that /dev/urandom is preferred and sufficient in all use cases (with the exception of early booting which just means the kernel leaves you hanging there, go figure).

If the random jitter works, then it seems to me like the kernel should be using it unconditionally, early, and for both random and urandom reads equally. High time to put the final nail in the coffin for the /dev/random legacy device. It's unfathomable to me why reading /dev/random over /dev/urandom should help at all. This kind of hoop jumping should be unnecessary.

I want a random device that "just works", no blocking, no shouting cryptic warnings at me, leaving me to figure things out on my own, which I'm ill equipped to do anyway, since I'm no cryptographer or kernel developer. I shouldn't even have to know or care about these implementation details. Which might just change again from one kernel version to the next.

But that's something for the kernel devs to figure out.

I just don't like users installing haveged unnecessary because they read some 10 years old guide

There are plenty of guides out there that lead straight to data loss. haveged is a service that uses next to no resources, so worst case should be... still no harm done. If so, why bother so much.

If other ways to tweak the random device are still valid and necessary and even strongly recommended (like the random seed), then why not haveged too.

haveged has provided where kernel has left us hanging, for years. Otherwise it would not have been developed and not become remotely popular. Changing this mindset will take some time...

jirka-h commented 2 years ago

Hallo Andreas,

I fully agree that /dev/random needs to be enhanced. Stephan Müller is trying to achieve this - he has proposed a new modern design, which addresses many problems of the current implementation. However, getting the changes to the mainline kernel is not easy - please check this email thread:

https://lkml.org/lkml/2021/11/21/143

I'm afraid that it will take a long time to improve the situation.

Jirka

jsyrjala commented 2 years ago

So after all the above discussion is haveged still useful currently?

jirka-h commented 2 years ago

Hi Juha,

yes, it's still useful. It can provide entropy early in the boot when /dev/random is not fully utilized.

On a fully booted system, it can be still used as an additional entropy source. It will insert entropy into the kernel every 60 seconds, thus diversifying your entropy sources.

I hope this helps Jirka

Dmole commented 2 years ago

Note that Kernel 5.10 still needs haveged;

uname -r && cd /proc/sys/kernel/random/ && cat poolsize entropy_avail
5.10.102
4096
557

and /dev/urandom / GRND_INSECURE are about to be disabled so haveged is more relevant than ever.

jirka-h commented 2 years ago

Thanks for that link! I think it indeed makes haveged more relevant when you need randomness very early in the boot and kernel's RNG takes a long time to initialize (on most platforms, kernel RNG initializes pretty fast, though).

I'm running kernel 5.16 and I can confirm that on a fully booted system, /dev/random and /dev/urandom behave the same way (in fact, both devices are the same on a fully booted system with this kernel). Both devices are nonblocking and provide random data at a rate of around 200MiB/s on my laptop [1]. I'm getting similar results with kernel 5.14. This is with haveged disabled.

What haveged does in this case, it provides an additional entropy source. It does not affect the speed but makes kernel RNG more trusted.

[1]

pv /dev/random > /dev/null 
[ 242MiB/s] 
jsyrjala commented 2 years ago

Currently the first thing that README.md mentions is that haveged is obsolete with recent kernels. Maybe that could be improved, since haveged still has its uses, even with recent kernels.

https://github.com/jirka-h/haveged/blob/master/README.md

jirka-h commented 2 years ago

Good point, Juha!

I have updated the https://github.com/jirka-h/haveged/blob/master/README.md - check this commit: https://github.com/jirka-h/haveged/commit/bfff89f0a8568fe1ce974261c0e706be141e175d