qdrvm / kagome

Kagome - C++20 implementation of Polkadot Host
https://kagome.readthedocs.io
Apache License 2.0
160 stars 34 forks source link

[Bug]: false-negatives on landlock support: linux kernels 5.13-6.8 #2262

Open Lederstrumpf opened 2 weeks ago

Lederstrumpf commented 2 weeks ago

Bug Summary

Starting in validator mode fails on systems with 5.13 <= linux kernel < 6.7 due to "missing landlock" support, even if landlock is correctly enabled.

Bug Description

On startup of a validator, kagome verifies that Secure Validator Mode can be enabled. Among these requirements is Landlock support by the linux kernel: https://landlock.io/. Landlock was introduced in linux kernel 5.13. While 5 generations of Landlock are now in existence, the Parity Client only requires the original release (V1) and - at least while the reference kernel version remains < 5.19 (V2 of Landlock ABI) - it does not use any features beyond V1's horizon, the primary rationale being indeterminism: https://github.com/paritytech/polkadot-sdk/blob/f5e7eaf610b50c6a6e3f65649908100ce8bea5b0/polkadot/node/core/pvf/common/src/worker/security/landlock.rs#L37-L73.

Starting up a kagome validator node on a system with 5.13 <= linux kernel version < 6.7 throws the following error:

24.11.05 01:28:07.085513  kagome           Warning   CheckSecureMode  Secure mode incomplete, cannot enable landlock for PVF worker: landlock_create_ruleset failed: Argument list too long
24.11.05 01:28:07.087291  kagome           Error     Application  Secure mode is not supported completely. You can disable it using --insecure-validator-i-know-what-i-do.

If the kernel is upgraded to >= 6.8 (Landlock V4), the node can boot in Secure Validator Mode just fine.

Kagome obviously does not have to adhere to every design decision made in the Parity Client, yet the indeterminism argument from the Parity Client's rationale also applies to Kagome nodes, both internally (within set of kagome nodes) and globally (set of kagome nodes within set of all nodes). Without further investigation/feedback, it is unclear to me whether Kagome (A) intends to support all Landlock versions like the Parity Client does or (B) intentionally feature gates at Landlock V4 (kernel >= 6.8).

If A, then Kagome's implementation of Landlock detection is faulty for kernels < 6.8.

If B, then the documentation should be updated to reflect this - else node operators might invest time needlessly into debugging their (working) landlock-enabled kernel while all they need is a more recent kernel. And if B is the case, then the Kagome implementation should also actually make use of Landlock functionality > V1 - else the gating is needless.

Steps to Reproduce

Start a validator node (--validator) on a system with 5.13 <= linux kernel version < 6.7 (uname -r) and landlock enabled (check for instance with dmesg | grep landlock || journalctl -kb -g landlock). Watch it crash and burn on the Secure Validator Mode check (unless yolo-ing into --insecure-validator-i-know-what-i-do).

Effects of the Bug

Validator cannot be started in Secure Validator Mode on a system supporting Landlock version less than V4.

Expected Behavior

Validator can be started in Secure Validator Mode on a system supporting at least Landlock V1 ABI (kernel >= 5.13, with Landlock enabled in kernel).

System Information

Lederstrumpf commented 2 weeks ago

https://github.com/qdrvm/kagome/blob/master/core/parachain/pvf/kagome_pvf_worker.cpp#L176-L186 specifies conditional Landlock flags: included in the landlock_create_ruleset call only if they're available.

Warning CheckSecureMode Secure mode incomplete, cannot enable landlock for PVF worker: landlock_create_ruleset failed: Argument list too long

Running an strace on the landlock_create_ruleset call:

[pid 4104530] landlock_create_ruleset({handled_access_fs=LANDLOCK_ACCESS_FS_EXECUTE|LANDLOCK_ACCESS_FS_WRITE_FILE|LANDLOCK_ACCESS_FS_READ_FILE|LANDLOCK_ACCESS_FS_READ_DIR|LANDLOCK_ACCESS_FS_REMOVE_DIR|LANDLOCK_ACCESS_FS_REMOVE_FILE|LANDLOCK_ACCESS_FS_MAKE_CHAR|LANDLOCK_ACCESS_FS_MAKE_DIR|LANDLOCK_ACCESS_FS_MAKE_REG|LANDLOCK_ACCESS_FS_MAKE_SOCK|LANDLOCK_ACCESS_FS_MAKE_FIFO|LANDLOCK_ACCESS_FS_MAKE_BLOCK|LANDLOCK_ACCESS_FS_MAKE_SYM|LANDLOCK_ACCESS_FS_REFER|LANDLOCK_ACCESS_FS_TRUNCATE, handled_access_net=LANDLOCK_ACCESS_NET_BIND_TCP|LANDLOCK_ACCESS_NET_CONNECT_TCP}, 16, 0) = -1 E2BIG (Argument list too long)

So the issue is that the V2-V4 feature flags from https://github.com/qdrvm/kagome/blob/master/core/parachain/pvf/kagome_pvf_worker.cpp#L176-L186 are included even if the running kernel does not support them. This is actually quite natural since their inclusion is determined at compile time, and the compiler has no notion of the feature support of the target host's kernel.