immunant / c2rust

Migrate C code to Rust
https://c2rust.com/
Other
4k stars 240 forks source link

(`c2rust-analyze`) Handle inline `const` refs, including string literals #886

Closed kkysen closed 1 year ago

kkysen commented 1 year ago

Fixes #837.

This only handles const refs that are inline and thus local to a function, as determined by being ConstantKind::Val. Thus, this handles string literals like "" (inline_str) and b"" (inline_bstr), which is the main thing we're trying to support.

This PR also adds tests for other similar string usages, though they're not handled and turned off for now. The outline cases, where the const ref/string literal is used through a named constant, aren't handled, as they are globally accessible and that makes them more complex. Since supporting them isn't necessary for string literals, which is the main usage in lighttpd-rust.

kkysen commented 1 year ago

I'm not sure if/how this PR should add exact READ | OFFSET_ADD permissions for string literals (and READ permissions for scalar literal refs).

kkysen commented 1 year ago

@spernsteiner, does this look like it's on the right track?

spernsteiner commented 1 year ago

I'm not sure if/how this PR should add exact READ | OFFSET_ADD permissions for string literals

I think the best approach for this is to assign PointerIds to these literals (as you've implemented here), set up the permissions/flags for those PointerIds in the initial Assignment prior to running dataflow and borrowck, and assert at the end that the permissions/flags weren't modified. This will catch cases where the program tries to mutate a string literal (or other constant) and will stop us from trying to generate a bogus rewrite, like trying to turn "foo" into &mut str.

kkysen commented 1 year ago

assert at the end that the permissions/flags weren't modified

If I do it this way, would it make more sense to keep const_tys separate from rvalue_tys?

aneksteind commented 1 year ago

If I do it this way, would it make more sense to keep const_tys separate from rvalue_tys?

@kkysen why not go about it this way, i.e. why is const_tys used instead?

kkysen commented 1 year ago

I was going to change it back to using rvalue_tys, but if I need to iterate through the constant ones at the end to check the permissions weren't modified, then it's a lot simpler if there's a separate map.

aneksteind commented 1 year ago

I think it would be cleaner to keep the rvalue_tys dict alone and instead have logic that checks permissions and matches against the RValue ty and checks there. If we want to check similar things in the future for different rvalue types we would end up adding more maps to the context to maintain the pattern and I don't think that'd be the best design

kkysen commented 1 year ago

There's not really a way to tell an LTy came from a constant, though, right? I'm not sure how to retrieve that information back.

kkysen commented 1 year ago

@spernsteiner, the post-typeck check is erroring, saying the permissions have expanded from READ | OFFSET_ADD to READ | UNIQUE | OFFSET_ADD. Is this correct?

kkysen commented 1 year ago

@spernsteiner, also, do you think any FileCheck tests are needed for this, or is the post-typeck check that the permissions are as expected more than enough?

spernsteiner commented 1 year ago

I agree it would be a little cleaner to use rvalue_tys for controlling type_of results instead of adding a second map to do lookups in. I think the strategy there would be to insert into rvalue_tys and also add an entry to a list a PointerId permission adjustments/assertions.

the post-typeck check is erroring, saying the permissions have expanded from READ | OFFSET_ADD to READ | UNIQUE | OFFSET_ADD. Is this correct?

It's harmless, since UNIQUE is only relevant when WRITE is also present, but I don't think it should be picking up the UNIQUE permission here. The inference for UNIQUE works by setting UNIQUE on all PointerIds and then removing it in places where it causes a borrowck error - possibly the code that sets UNIQUE is running after the new code for setting the fixed permissions on these constants?

do you think any FileCheck tests are needed for this, or is the post-typeck check that the permissions are as expected more than enough?

The new test that's currently on this branch and the assertions seem good enough to me.

kkysen commented 1 year ago

I agree it would be a little cleaner to use rvalue_tys for controlling type_of results instead of adding a second map to do lookups in. I think the strategy there would be to insert into rvalue_tys and also add an entry to a list a PointerId permission adjustments/assertions.

I moved the LTys to rvalue_tys now (in 8665bc2d8cb5c78fc840da1b3bdeabf96f68af9f), and changed the const_ref_tys map to the const_ref_locs Vec<Location>, which just indexes into the rvalue_tys map.

@spernsteiner, @aneksteind, does this look good now?

the post-typeck check is erroring, saying the permissions have expanded from READ | OFFSET_ADD to READ | UNIQUE | OFFSET_ADD. Is this correct?

It's harmless, since UNIQUE is only relevant when WRITE is also present, but I don't think it should be picking up the UNIQUE permission here. The inference for UNIQUE works by setting UNIQUE on all PointerIds and then removing it in places where it causes a borrowck error - possibly the code that sets UNIQUE is running after the new code for setting the fixed permissions on these constants?

I now unset UNIQUE before doing the check (in 91bfa851d9ad5f816ab87b37257fcf3b823519ea).

kkysen commented 1 year ago

I'm not sure why PhantomLifetime is necessary. Without knowing for sure it seems like an unnecessary complication.

It was needed due to how lifetimes and impl Trait work where I can't do multiple impl Trait + 'a + 'bs and have it work, but I can do impl Trait + PhantomLifetime<'a> + 'b and that works. It's similar to PhantomData and type parameters, but with lifetimes.

kkysen commented 1 year ago

I also think the vector of locations for const refs is similarly unnecessary in that the permissions can be checked in another visit of the body after the permissions have been propagated (maybe in the borrowck phase or somewhere in between) in combination with adding constraints during the dataflow visitation.

Do you mean re-iterating through BasicBlocks, Statements, Rvalues, etc. to find the Constants again? Why is that better than doing it once and storing the result?

aneksteind commented 1 year ago

Do you mean re-iterating through BasicBlocks, Statements, Rvalues, etc. to find the Constants again? Why is that better than doing it once and storing the result?

I believe re-iterating whether through another visitor or a for-loop is better than adding an extra attribute to the context because it doesn't have an invariant of location being present in the index operations and it wouldn't be keeping extra state. I think the check could happen in borrowck and in that case wouldn't be going through another pass, but that doesn't feel ideal because the pass isn't meant to perform propagation checks per se

kkysen commented 1 year ago

an invariant of location being present in the index operations

What do you mean by this? That const_ref_locs's Locations are present in rvalue_tys? That seems like a pretty simple invariant. Or I might be misunderstanding.