analyze: proposal for simplified analysis and rewriting of allocations

spernsteiner commented 3 weeks ago

Correct rewriting of malloc/free to Box<T> requires tracking ownership throughout the program. This proposal describes an approach that produces correct rewrites with minimal new analysis work, at the cost of introducing run-time panics when the C code does not follow Rust ownership discipline.

The main idea is to rewrite owning pointers not to Box<T> but to Option<Box<T>>. The Option is Some if ownership still resides at this location, and it is None if ownership has been transferred to some other location. Accessing a location after ownership has been transferred away will cause a run-time panic due to unwrapping None.

Definitions

A "pointer location" is a local variable or memory location of pointer type. MIR temporaries representing the results of intermediate expressions are included in this definition.

A "heap object" is a region of memory allocated by malloc or a similar function. A new object is "live" until it is freed by passing a pointer to the object to free or a similar function.

"Ownership" is a relation between objects and pointer locations. At all points during execution, each live object is owned by one or more pointer locations. (Single ownership of an object corresponds to Box<T>; multiple ownership corresponds to Rc<T>.) If a pointer location owns an object, then the value in that location is a pointer to the owned object. In some situations, copying a value from a source pointer location to a destination may also transfer or copy ownership from the source to the destination.

Rewrites

This section describes the new rewrites we would perform under this proposal. The next section describes analysis changes that enable these rewrites.

Type rewrites: Many pointer locations function as "borrowed references" and never hold ownership of an object in any execution of the program. These are handled by existing rewrites. For locations that may hold ownership, we add a new rewrite that replaces *const T/*mut T with Option<Box<T>>.

This approach to type rewrites means that for any pointer location that might hold ownership at some points in some executions, we assume that all assignments to that pointer location also grant ownership. We call such a pointer location an "owning pointer location". Note that an owning pointer location is not guaranteed to hold ownership throughout its lifetime; ownership may be transferred away during an assignment from the pointer location in question. This is why we must rewrite to Option<Box<T>> rather than Box<T>.

Expression rewrites: In every (static) assignment operation q = p on pointers, the LHS and RHS can each be either an owning pointer location or a non-owning pointer location.

Owning to owning: Ownership is transferred from p to q. The assignment is rewritten to q = Some(p.take().unwrap()), which panics if p does not currently hold ownership (that is, if p is None) and otherwise removes ownership from p and grants it to q.

An alternative rewrite for this case would be q = p.take(). This similarly transfers ownership if p currently holds it, but doesn't panic if p does not hold ownership. Instead, the lack of ownership is propagated to q. This seems likely to delay panics in a way that will make issues harder to debug.
Owning to non-owning: A borrowed reference is derived from p and assigned to q. The assignment is rewritten to q = p.as_deref().unwrap() or similar. This panics if p does not hold ownership, and otherwise produces a value of type &T pointing to the object within the Option<Box<T>>.
Non-owning to owning: This is an error for the rewriter. There is no reasonable safe way to convert &T/&mut T to Box<T>. The analysis described below is designed such that this case should never occur.
Non-owning to non-owning: This is handled by the existing rewrite rules.

Similar rewrites apply to pseudo-assignments. For example, f(p) may be rewritten to f(Some(p.take().unwrap())) if the argument of f is an owning pointer location.

We must also rewrite several libc functions:

malloc(size): malloc returns void*, but we can recover the type T being allocated using the existing pointee type analysis. If size is a multiple n * size_of::<T>(), then we rewrite the malloc call either to Box::new(T::default()) (if n == 1 and the return value lacks the OFFSET permission) or to Box::new([T::default(); n]) as Box<[_]>. In cases where T doesn't implement Default and/or Copy, we can generate alternative rewrites such as Box::new(T { field1: 0, field2: None, ... }) or iter::repeat_with(T::default).take(n).collect::<Box<[_]>>().
free(p): We rewrite this to drop(p.unwrap()), which will deallocate the object pointed to by p and will panic if p does not currently hold ownership of any object.

As with owning-to-owning casts, this could be made not to panic by instead rewriting to drop(p). However, removing this panic might hinder debugging. When free(p) is called and p does not hold ownership, either the pointer that does hold ownership of the object was (or will be) freed elsewhere, or it will go out of scope without an explicit free call. The first represents a double free in the C code. The second represents a bug in our analysis: we rewrote the program in a way that transfers ownership away from p, but the ownership should have been left in place.
realloc(p, size): We first infer the type T and count n as in the malloc case, then rewrite into something like the following:
```
q = Some(p.unwrap().into_iter() .  chain(iter::repeat_with(T::default)) .take(n) .collect::<Box<[_]>>())
```
This moves all values in p to a new object (allocated as with malloc(size)), either truncating or padding with T::default() to length n, and stores the new Box<[T]> into q. The old object is deallocated as with free(p). Similar to free, attempting to use a pointer location that doesn't currently hold ownership will result in a panic.

Analysis

For distinguishing owned and non-owned pointer locations, we can use the existing FREE permission. FREE is set on the arguments of the libc free() function and is propagated backwards. This means all pointer locations from which a pointer might flow to free will be considered owning, and all others will be non-owning.

Because Box<T> cannot be used for pointers to the stack (unlike &/&mut), it may be useful for debugging to add a simple analysis to identify pointer locations that may contain pointers to the stack. This can be implemented as a forward-propagated HEAP permission to identify pointers that must point to the heap (or be null). This can be implemented like NON_NULL: set HEAP initially for all pointer locations, and remove it for pointers that result from &local or a similar expression.

In the future, we may want to implement a linearity analysis to detect cases where a single object is potentially owned by multiple pointer locations (and thus should be rewritten using Rc<T>). For this proposal, we don't use such an analysis and instead introduce panics in cases of multiple ownership.

Interaction with `NON_NULL`

In this proposal, we rewrite all potentially-owning pointer locations to Option<Box<T>>. This means a pointer location that is both owning and nullable would have the type Option<Option<Box<T>>>, which is confusing. For clarity, we may wish to use a different type, such as Result<Box<T>, OwnershipLost> (where OwnershipLost is a zero-sized error type), for owning locations. Then a nullable owning pointer location would have type Option<Result<Box<T>, OwnershipLost>>, where None means the pointer location was assigned null, Some(Ok(p)) means it was assigned a valid pointer and still has ownership of the object, and Some(Err(OwnershipLost)) means it was previously assigned a valid pointer but ownership was since transferred away.

ahomescu commented 2 weeks ago

Non-owning to owning: This is an error for the rewriter. There is no reasonable safe way to convert &T/&mut T to Box<T>

How about Copy or Clone where those are supported? It's not always sound, but it could work in some cases.

ahomescu commented 2 weeks ago

Some(p.take().unwrap()) could be p.take().or_else(|| panic!("Got None")) or something like that (not that it made it shorter).

ahomescu commented 2 weeks ago

we rewrite the malloc call either to Box::new(T::default())

We could also use Box::new_uninit, if we can live with it being nightly and unsafe.

ahomescu commented 2 weeks ago

Then a nullable owning pointer location would have type Option<Result<Box<T>, OwnershipLost>>

I'm generally in favor of using standard types, but this is a bit clunky. We could just use our own 3-state type e.g.

pub enum Owner<T> {
    Valid(T),
    Null,
    LostOwnership,
}

spernsteiner commented 2 weeks ago

How about Copy or Clone [to produce Box from &T] where those are supported?

In some cases this would change behavior (due to the change in aliasing), and I think it would be hard to tell whether a given instance of this rewrite changes behavior or not.

We could also use Box::new_uninit, if we can live with it being nightly and unsafe.

The fact that it's unsafe would be a problem for our safety story. Currently, if all goes well with the analysis and rewriting, then the result has no unsafe code at all. This saves us from needing more complicated analyses, like figuring out whether all fields are actually initialized following Box::new_uninit.

I'm generally in favor of using standard types, but this is a bit clunky. We could just use our own 3-state type

A downside of the 3-state type is that it's not compositional. But I agree it's clunky to write out the type in full. I could see having an alias like type Owned<T> = Result<Box<T>, OwnershipLost>, or maybe making it a custom type (Result doesn't have a method like Option::take).

spernsteiner commented 5 days ago

While implementing this, I ran into an issue involving FREE and OFFSET permissions. Example code:

let p = malloc(...) as *mut i32;
*p.offset(1) = 1;
let q = p;
free(q as *mut c_void);

p is inferred to have the FREE permission (because it flows into free) and the OFFSET permission (because it flows into p.offset(1), so its rewritten type is Box<[i32]>. q is inferred to have FREE, but not OFFSET, so its rewritten type is Box<i32>. The rewriter thus attempts to insert a cast from Box<[i32]> to Box<i32> at the q = p assignment, but that cast isn't currently supported.

Option 1: add support for the cast. For example, q = Box::new(mem::take(&mut p[0])); drop(p);. One downside of this is that it will panic if p.len() == 0 (which is likely uncommon). Another downside is that it will change the allocation behavior of the program in a way that the user may find surprising.

Option 2: unify PointerIds along the path from malloc to free. The type at drop() time generally must be the same type that was produced by Box::new, so all variables along the path from Box::new to drop must have the same type. We could enforce this by giving all of those pointers the same PointerId, so all will have identical permissions and flags. Specifically, we could unify the PointerIds of the LHS and RHS of any assignment where the output has FREE. A downside of this approach is that it would require some refactoring, as currently we handle unification before running any dataflow analysis, but running dataflow is required to propagate FREE.

Option 3: modify dataflow constraints. Assignments are normally allowed to discard any subset of permissions (the LHS permissions can be any subset of the RHS permissions). We could instead forbid assignments from discarding OFFSET while retaining FREE. That is, LHS must be a subset of RHS, but if LHS has FREE and RHS has OFFSET, then LHS must also have OFFSET. This prevents the case that requires the unsupported cast, where the RHS has OFFSET | FREE but the LHS has only FREE. A downside of this approach is that it adds a somewhat subtle special case to the dataflow analysis.

I've implemented option 3 for now, but it might be cleaner to switch to option 2.

immunant / c2rust