Re-evaluate ABI compatibility rules in light of CFI

RalfJung commented 7 months ago

The goal of CFI (control-flow integrity) is to make it harder to exploit bugs. It does that, in my rough understanding, by adding extra checks to each indirect jump checking that the destination we are jumping to can plausibly have happened in a "proper" execution of the program. I am focusing on calls of function pointers here, though CFI also needs to consider other indirect jumps.

In C, basically all mismatches between caller and callee signatures on such function calls are UB. Therefore, CFI can do whatever it wants with such calls without causing any issues for correct C programs.

In Rust, we have recently defined the behavior of some cases where caller and callee have a different type. This means if CFI rejects such calls that we consider ABI-compatible, then entirely well-defined Rust programs might be affected. That seems undesirable.

I had a long chat with @rcvalle about this earlier this week, and from my understanding, the main ABI compatibility rule that CFI people really dislike is us declaring all pointers mutually compatible (if their metadata matches), and similar for declaring all function pointers mutually compatible.

(There was also some discussion about the case of different ways of writing "the same" integer type, such as u64 vs usize on a 64-bit-system, or char vs u32. But allowing such mismatches seems to have negligible impact on CFI's ability to detect exploits, so this is much less relevant than the pointers. This becomes more relevant when considering C bindings, where "fully integer-type aware CFI" requires distinguishing the plethora of integer types that C has, even though Rust makes no different between them. Currently I think there is no plan to try to expect the wider ecosystem to follow such a discipline, but this may come up in the future.)

So... should we restrict what we consider ABI-compatible? Of course it would be ridiculous to introduce extra UB for this purpose, but we could declare certain forms of ABI mismatches as "erroneous behavior" (term stolen from recent C++ discussions): these mismatches are unambiguously errors, but the error may or may not be detected. If they do not get detected, the program is still well-behaved. If they get detected, that might be a panic or abort (details TBD). This is basically how Rust treats integer overflow.

I don't know what such restricted ABI compatibility rules should look like. Having ABI mismatches to begin with should already be extremely rare, so it seems hard to evaluate what a good trade-off might be here. @rcvalle, if you have examples of such violations you found that the current ABI rules would allow, that would be quite useful. :) This issue is tracking CFI violations in the standard library, but two of them seem to be violations due to the compiler somehow not tracking the required metadata (i.e., unrelated to what we do or do not consider ABI compatible), and the other two I was not able to find in the current standard library.

The only other idea I have for exploring this is to implement some form of restricted check in Miri, and see if anyone complains. I am not sure what good candidate rules for such a check would be though.

Some concrete questions that are part of this:

Should we really make all pointer types mutually ABI compatible? See https://github.com/rust-lang/rust/issues/128728
Should we make repr($int) enums ABI-compatible with $int? See https://github.com/rust-lang/rust/pull/128600

Lokathor commented 7 months ago

Pointer mismatches seem the easiest thing to mess up. Particularly *mut u8 and *mut i8, because those are the two c_char types, and some libs just treat c_char as always being u8.

Does a CFI check "failing" cause UB in the program? or does it just make the program panic/abort/die/whatever?

saethlin commented 7 months ago

Does a CFI check "failing" cause UB in the program? or does it just make the program panic/abort/die/whatever?

these mismatches are unambiguously errors, but the error may or may not be detected. If they do not get detected, the program is still well-behaved. This is basically how Rust treats integer overflow.

Lokathor commented 7 months ago

That's a proposal for how things could work in the future. I'm asking what existing CFI systems do if the Rust project makes no changes.

EDIT: Also, that proposal doesn't quite say what non-UB happens when an error is detected. panic? C code doesn't have a panic system, after all.

chorman0773 commented 7 months ago

At the very least, two pointers should be compatible if the pointees are compatible, or if one (or both) are integers, the other pointee would be compatible with the signed-variant.

Also, Pointers to Struct/Pointers to Union are, by my reading of the C Standard, compatible. Our rules should take that into account with #[repr(C)] structs and unions. That being said, I think I buy that CFI is a good reason to undo the rules as adopted - especially since they've been stabilized, so we should strongly weigh against reversing them.

Diggsey commented 7 months ago

Generally, when a function is declared with one signature and called via a function pointer with a different signature, the two signatures must be ABI-compatible or else calling the function via that function pointer is Undefined Behavior.

This statement still seems to hold whether or not CFI is in use, so not sure anything needs to be backed out... ABI compatibility is still sufficient to avoid UB, but with CFI there are extra conditions in order to avoid an error, beyond ABI compatibility.

It seems awkward to talk about UB if we amend the ABI compatibility rules, since then we'd have to say something like "it's UB to call a function that was declared in an ABI-incompatible-except-for-the-CFI-rules way".

comex commented 7 months ago

Also, Pointers to Struct/Pointers to Union are, by my reading of the C Standard, compatible. Our rules should take that into account with #[repr(C)] structs and unions. That being said, I think I buy that CFI is a good reason to undo the rules as adopted - especially since they've been stabilized, so we should strongly weigh against reversing them.

Huh, that's odd wording, isn't it? The C spec says that "All pointers to structure types shall have the same representation and alignment requirements as each other.", with a footnote (albeit from an earlier clause) that says "The same representation and alignment requirements are meant to imply interchangeability as arguments to functions, return values from functions, and members of unions." But this is explicitly not saying that the types are "compatible", which is the spec's normal standard for when type punning is allowed. You know the standard better than I do, so maybe you understand what's going on here?

In any case, in practice, CFI systems do differentiate function types based on which specific structs their arguments refer to. Which makes sense: in C code in practice, punning between void (*)(struct foo *) and void (*)(struct bar *) is quite uncommon*, and differentiating them provides greatly increased granularity, which CFI craves.

Regardless, I'd say that CFI systems don't consider themselves strictly bound by the C standard any more than CHERI does. The C standard is originally decades old, so it's no surprise that they didn't envision that tightening certain requirements could provide a valuable security improvement. Of course the standard has been updated since then, but with a conservative attitude towards backwards compatibility that generally doesn't allow such tightening.

Which makes me wonder how Rust will deal with similar situations in the future. Rust's formal spec for ABI is brand new, so it can aim to be compatible with things like CFI, CHERI, and pointer authentication out of the gate. But they're still breaking changes in practice, compared to the previous informal understanding and compiler behavior. And this is a one-time opportunity. Maybe 10 years from now there will be a new security technique, or performance technique, or just a new kind of platform, that requires some other kind of breaking change. If and when that happens, Rust could offer a compatibility mode that is fully compatible with the existing spec, and IMO it should do so if at all possible. But there will also be a demand for a stricter mode that accepts the breakage in order to achieve whatever improvement the new thing has to offer. I'd like to see that as not a catastrophe, but an opportunity.

* as opposed to punning between void (*)(struct foo *) and void (*)(void *), which is more common in practice, but definitely UB and also banned by CFI

chorman0773 commented 7 months ago

But this is explicitly not saying that the types are "compatible", which is the spec's normal standard for when type punning is allowed. You know the standard better than I do, so maybe you understand what's going on here?

I read the statement as functionally equivalent to compatibility, particularly given the footnote. It may not be a valid argument for strict aliasing, but the footnote makes it clear that it is correct for function calls. I can't read the section any other way than them being compatible.

as opposed to punning between void (*)(struct foo *) and void (*)(void *), which is more common in practice, but definitely UB and also banned by CFI

Yeah, these won't be compatible (though they can be casted to an from each other freely, you just need to cast back before the call).

void* however is compatible with char*, signed char*, and unsigned char*.

Diggsey commented 7 months ago

I read the statement as functionally equivalent to compatibility, particularly given the footnote. It may not be a valid argument for strict aliasing, but the footnote makes it clear that it is correct for function calls. I can't read the section any other way than them being compatible.

But that footnote is referenced from an entirely separate point (6.2.5.9) talking about the range of values that can be stored in the type. Surely all it's saying is that you could swap one type with the other and it would be functionally the same (ie. accept the same set of values), not that you can mix and match - using type A in some declarations and type B in others.

Any other reading would imply that alignment and size equivalence are enough to ensure compatibility between those types, which seems to be clearly not intended.

in C code in practice, punning between void ()(struct foo ) and void ()(struct bar ) is quite uncommon*

But this is clearly well-defined in C if foo and bar are compatible, so CFI is already adding extra constraints beyond what is defined by the language.

CAD97 commented 7 months ago

One potentially interesting variant (but probably not one CFI folks would be particularly enthusiastic about) would be to do CFI checks for safe function pointers but not for unsafe function pointers and/or with reference arguments' pointees but not pointer arguments'. But on the other hand, cross-language extern function linking is where CFI checks are most beneficial, specifically because those are where type mismatches are most likely to sneak in.

I think the most practical choice would be to define a subcategory of ABI-compatibility for CFI-compatibility, say pointers are CFI-compatible if their pointees are ABI-compatible, and limit behavior to some subset of safe misbehavior when a CFI mismatch occurs, depending on compilation flags. (Once again the between talking about compatibility of the argument types or the function call itself makes description difficult.)

It's potentially desirable to stick a "conditionally supported" style disclaimer on any transmute-adjacent functionality. As another example, I know rust-gpu has additional restrictions, though I forget the specifics, and any target with typed memory style restrictions isn't particularly happy about Rust's conparatively free license to transmute/reinterpret_cast.

One of the purposes of unsafe is playing fast and loose with memory as "just" bytes, and most software integrity mitigations are about putting pressure in the other direction and restricting the type of shenanigans the program is allowed to get up to.

comex commented 7 months ago

But this is clearly well-defined in C if foo and bar are compatible, so CFI is already adding extra constraints beyond what is defined by the language.

Two structs with different tags are never compatible types (setting aside the question of whether they're functionally equivalent to compatible).

One potentially interesting variant (but probably not one CFI folks would be particularly enthusiastic about) would be to do CFI checks for safe function pointers but not for unsafe function pointers and/or with reference arguments' pointees but not pointer arguments'. But on the other hand, cross-language extern function linking is where CFI checks are most beneficial, specifically because those are where type mismatches are most likely to sneak in.

That's not really the purpose of CFI, though. To be sure, catching cases where the program has legitimately cast a function to the wrong type is useful. But the main goal is to address cases where an attacker has already corrupted memory and overwritten a function pointer with their own (potentially arbitrary) value. With CFI, rather than the attacker being able to pick any code address, they have to pick the entry point of some function that has the correct type (potentially with more limitations, e.g. the function must be one that some code takes a pointer to). This doesn't completely prevent damage, but it does significantly limit the attacker's options and make it harder for them to pivot to useful gadgets.

Memory corruption of course requires some unsafe code somewhere in the process, but that code is often completely unrelated to the code that owns or uses the memory being corrupted. Thus, safe and unsafe calls are equally important for CFI.

RalfJung commented 7 months ago

Does a CFI check "failing" cause UB in the program? or does it just make the program panic/abort/die/whatever?

It's just an abort, already in line with "erroneous behavior".

EDIT: Also, that proposal doesn't quite say what non-UB happens when an error is detected. panic? C code doesn't have a panic system, after all.

Yes that is deliberately left to-be-decided.

At the very least, two pointers should be compatible if the pointees are compatible, or if one (or both) are integers, the other pointee would be compatible with the signed-variant.

Calling out integers specifically here is somewhat odd. It feels like if we go down that route, there should be a more general notion of "types with compatible memory layouts", and that is the constraint placed on pointees. I don't know if that's actually practical for CFI though.

Regardless, I'd say that CFI systems don't consider themselves strictly bound by the C standard any more than CHERI does.

Sure, there's some risk that CFI systems will just reject Rust programs that we declare officially correct, and start filing PRs to make crate authors follow their rules. That's a situation I'd like to avoid; ideally we should be part of that discussion.

rust-lang / unsafe-code-guidelines

Re-evaluate ABI compatibility rules in light of CFI #489