immunant / c2rust

Migrate C code to Rust
https://c2rust.com/
Other
3.91k stars 229 forks source link

(`c2rust-analyze`) Fix dataflow constraints in ptr casts #913

Open kkysen opened 1 year ago

kkysen commented 1 year ago

Dataflow constraints aren't yet added for ptr casts:

https://github.com/immunant/c2rust/blob/11bf3514929133cd5ee44fe4eb5956bb0e4c07e8/c2rust-analyze/src/dataflow/type_check.rs#L115-L124

Only one part of them is currently added unsizing casts, added originally in #883, but not nearly complete.

As explained by @spernsteiner in https://github.com/immunant/c2rust/pull/883#discussion_r1177145522:

The visit_statement case for handling pl = rv works as follows:

let pl_lty = acx.type_of(pl);
let rv_lty = acx.type_of_rvalue(rv);
do_assign(pl_lty, rv_lty);`

where the do_assign call establishes dataflow constraints to ensure that it's valid to assign a value of type rv_lty into a location of type pl_lty. As usual, this means rv_lty must be a subtype of pl_lty. Our subtyping rules are as follows:

  • First, the unannotated types (pl_lty.ty and rv_lty.ty) must be identical.
  • If the outermost type is a pointer or reference, then there are outermost PointerIds pl_lty.label and rv_lty.label. We require that perms(pl_lty.label) ⊆ perms(rv_lty.label). That is, you can discard permissions on the outermost pointer type during an assignment, but can't add them.
  • For all other PointerIds in pl_lty, there must be a corresponding PointerId at the same location within rv_lty (and vice versa), and the permissions on those corresponding PointerIds must be identical.

For simple cases, like when rv is a local variable, this logic for StatementKind::Assign works fine: rv_lty is exactly the type of the local, and we add some constraints between that and pl_lty (which is usually another local). But in cases where there's an entry in rvalue_tys, there is no direct relationship between rv_lty and rv. It's the job of visit_rvalue to establish such a connection in these cases.

For example, suppose we have an AggregateKind::Array assignment like (ys: [V; n]) = [(x1: T1), (x2: T2)]: [U; n]. Here [U; n] is rv_lty, taken from rvalue_tys. visit_statement will establish the subtyping constraint [U; n] <: [V; n]. And visit_rvalue will make "pseudo-assignments", roughly as if the programmer had written tmp[0] = x1; tmp[1] = x2; ys = tmp; rather than ys = [x1, x2];, which establish the constraints T1 <: U and T2 <: U.

In the CastKind::Unsize case for (y: &[V]) = (x: &[T; n]) as &[U] we are doing a sort of "modified pseudo-assignment", as in let tmp: &[U] = x: &[T; n]; y: &[V] = tmp: &[U];. The pseudo-assignment tmp = x would normally fail, because our subtyping rules don't allow for &[T; n] <: &[U] (the first condition is violated because the unlabeled types are not equal), but as a special case, here we allow the pointee type to change from [_; n] to [_] because that's the purpose of an Unsize cast.

We should have similar pseudo-assignments in all the visit_rvalue cases where we expect an rvalue_tys entry to be present. I see the CastKind::Pointer case doesn't have one, which I think is a bug - on y = (x: *mut u8) as *const u8, we would fail to propagate permissions from y to x due to a missing edge between rv_lty = *const u8 and op_lty = *mut u8. I think the Unsize case should also be establishing constraints between any PointerIds within the pointee types; of the three subtyping rules given above, this case is enforcing the first (in modified form) and the second, but not the third.