google / zerocopy

https://discord.gg/MAvWH2R6zk
Apache License 2.0
1.52k stars 98 forks source link

Restrict support for `#[derive(IntoBytes)]` on unions, work to guarantee forwards-compatible soundness #1792

Open joshlf opened 1 week ago

joshlf commented 1 week ago

Do you need IntoBytes support on unions? Let us know at #1802.

Progress

Details

Support for #[derive(IntoBytes)] on unions was added in https://github.com/google/zerocopy/commit/9c19cbe1cf5c2621ae9a63179664db3470146c10 (reviewed at fxrev.dev/639087). This support is sound on the assumption that if every field of a union is IntoBytes and there is no extra padding before or after any field, then no bit-valid instance of that union can have uninitialized bytes.

However, this assumes too much. It is currently up in the air whether this actually holds. It's not clear from reading the history why we were okay adding this implementation in the first place, but it now seems like it may have been premature.

It may eventually become the case that this assumption is guaranteed by Rust, so this may eventually become a non-issue. However, for the time being, it's a violation of our soundness policy, which promises to be sound on all future Rust compilers.

Short-term mitigation

Unfortunately, existing users rely on #[derive(IntoBytes)] support on unions, so removing this support will break users. This is likely not a problem on today's Rust, so forcing users to migrate to something else might be too drastic of a solution. Instead, I propose that we discourage further use by gating #[derive(IntoBytes)] on unions behind a --cfg. The reason for a --cfg instead of a Cargo feature is that Cargo features are footguns in cases like this - it's easy for crate A to enable the feature and then for crate B, which depends on A, to accidentally rely on that feature despite not enabling that feature itself. For something with soundness and stability implications, that's risky. By contrast, a --cfg requires the top-level crate to enable it for all downstream crates.

Long-term mitigation

There are two options for long-term mitigations.

Union validity

We can try to get Rust to promise what we need in order for #[derive(IntoBytes)] on unions to be guaranteed sound on all future compilers. In particular, this entails restricting union bit validity so that, if the byte at a given offset is initialized in every valid value of every field, then it must be initialized in the union as well.

Union safety

Alternatively, we can take a weaker approach that only requires a safety rather than bit validity constraint: https://github.com/google/zerocopy/issues/1792#issuecomment-2389405203

joshlf commented 1 week ago

Summarizing a discussion with @jswrenn

See also: #260

The situation may be more permissive than I originally thought. In particular, while it might be the case at some point in the future that any union with all uninitialized bytes is considered bit-valid (ie, it is sound to produce such a union value), it is currently the case that creating such a value requires unsafe code. Safe code is only permitted to instantiate or modify a union value using its actual fields.

It is already the case that zerocopy's traits imply both bit validity and safety requirements. For example, FromBytes attests that a type has a particular bit validity, but it also permits users to construct arbitrary instances of a type, which implies that that type must not have certain safety invariants which are inconsistent with constructing a value from arbitrary bytes. We can say that IntoBytes is similar - it adds a new safety invariant to types that they must only be constructed from their fields.

In order for this to continue to be sound, we need the following forwards-compatibility guarantees from Rust: