analyze: null pointer support

spernsteiner commented 1 year ago

We should add support to c2rust-analyze for tracking (possibly) null pointers.

First, we should add a new pointer permission: PermissionSet::NON_NULL. A pointer with this permission is known to never be null at run time. NON_NULL is set (or not) when the pointer is created, and it flows forward along dataflow edges. The results of Rvalue::Ref and Rvalue::AddressOf should have the NON_NULL permission set; ptr::null(), ptr::null_mut(), and casting the constant 0 to a pointer type do not set the permission; and casting other integers to pointer types is (still) unsupported and therefore panics.

The permission is NON_NULL rather than NULL because we allow dropping permissions in any assignment. This means removing a permission from a pointer's PermissionSet must allow the pointer to take on more values, not restrict it to fewer values.

The reason to continue rejecting nonzero constant integer-to-pointer casts like 3 as *const u8 is to avoid losing information and potentially changing program behavior. The result of 3 as *const u8 can't be a value of type &u8 because it's not a valid pointer, and it can't be None because that would lose the fact that the value is 3 and not 0. We don't want 0 as *const u8 == 3 as *const u8 to be rewritten into None == None and wrongly evaluate to true.

Second, we'll need to update the type_desc and rewrite::ty modules to recognize the lack of NON_NULL and use Option<_> when needed.

Third, rewrite::expr will need to insert Some(_), None, and _.unwrap() as needed. Casts from NON_NULL to nullable pointers will need to wrap the pointer expression in Some(_). Operations that produce null pointers should be replaced with None. And dereferences of nullable pointers should call _.unwrap() first.

Some operations may need special handling. I think offset should require a non-null pointer (meaning rewrite::expr should unwrap() the input if it's nullable, like it does for deref), since it's UB to call it on invalid pointers. This means offset can unconditionally return NON_NULL regardless of its input. And is_null can be converted to is_some() if the pointer is nullable or replaced with literal true if it's NON_NULL. (Converting if p.is_null() { ... } to if let Some(p) = p { ... } would be nice to have, but isn't essential and would probably be a bit tricky.

It might be safe for us to define malloc and related functions to return NON_NULL, which would let us eliminate a lot of Options, because the Rust translations of those functions (Box::new/Vec::new) abort on OOM instead of returning null.

kkysen commented 1 year ago

It might be safe for us to define malloc and related functions to return NON_NULL, which would let us eliminate a lot of Options, because the Rust translations of those functions (Box::new/Vec::new) abort on OOM instead of returning null.

I'm not sure about this. C malloc can pretty clearly return NULL, and there's a lot of code that's going to check for this.

I think there's two cases here:

If the C code checks for NULL and never had UB, then we should use Box::try_new (unstable but we use other unstable things).
If the C code doesn't check for NULL and just uses the pointer, then it has UB, so we can reinterpret that as panic on OOM. In that case, we're free to use Box::new and panic in the UB/OOM case, which is allowed due to the UB.

kkysen commented 1 year ago

Also, similar to offset, I think we can also use UB in other cases. Whenever a pointer is dereferenced or a function/method called on/with it that requires it to be valid (e.x. offset), then it must be NON_NULL. Then "casts" from non-NON_NULL to NON_NULL will be .unwrap()s.

spernsteiner commented 1 year ago

It would be legal under our definition of correctness (in which we either preserve behavior exactly or introduce a panic) to assert after every malloc call that the result is non-null. This would be more in line with idiomatic Rust code, and I think intelligently handling OOM is rare even in C. So I think inserting panics here is justifiable and would help our analysis produce better results (fewer unnecessary Options).

If the C code doesn't check for NULL and just uses the pointer

Whenever a pointer is dereferenced or a function/method called on/with it that requires it to be valid (e.x. offset), then it must be NON_NULL.

I think these are covered by the discussion on the NON_NULL PR (#818)

immunant / c2rust

analyze: null pointer support #816