rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.35k stars 12.72k forks source link

Off-by-one spans in MIR borrowck errors #46885

Closed davidtwco closed 6 years ago

davidtwco commented 6 years ago

This is a minor thing, but you can see in the below snippet that just below line 18, the borrowed value only lives until here is a character to the right of where it is in the Ast equivalent.

error[E0597]: `x` does not live long enough (Ast)
  --> $DIR/issue-46471.rs:15:6
   |
15 |     &x
   |      ^ does not live long enough
...
18 | }
   | - borrowed value only lives until here
   |
   = note: borrowed value must be valid for the static lifetime...

error[E0597]: `x` does not live long enough (Mir)
  --> $DIR/issue-46471.rs:15:5
   |
15 |     &x
   |     ^^ does not live long enough
...
18 | }
   |  - borrowed value only lives until here
   |
   = note: borrowed value must be valid for the static lifetime...

error: aborting due to 2 previous errors

I've seen it in the following tests and there are probably a handful of others:

davidtwco commented 6 years ago

I'll tackle this one.

davidtwco commented 6 years ago

Here's what I've figured out so far:

I've been tracing the span in the context of the issue-46472 test. The errors in the examples given originate from the report_borrowed_value_does_not_live_long_enough function, specifically the drop_span argument:

https://github.com/rust-lang/rust/blob/51b47dc4a1af3260738aa7c5d4e31e8d77c0c0b6/src/librustc_mir/borrow_check/error_reporting.rs#L323-L329

This function is called (at least in the example given) by the check_access_for_conflict function:

https://github.com/rust-lang/rust/blob/51b47dc4a1af3260738aa7c5d4e31e8d77c0c0b6/src/librustc_mir/borrow_check/mod.rs#L837-L845

The span is passed to that by its caller, which is the access_place function:

https://github.com/rust-lang/rust/blob/51b47dc4a1af3260738aa7c5d4e31e8d77c0c0b6/src/librustc_mir/borrow_check/mod.rs#L722-L723

The span is passed to that function by its caller, which is the visit_statement_entry function:

https://github.com/rust-lang/rust/blob/51b47dc4a1af3260738aa7c5d4e31e8d77c0c0b6/src/librustc_mir/borrow_check/mod.rs#L414-L422

We can see from the above snippet that the span is part of a Statement, specifically it is a StatementKind::StorageDrop statement. Going further, that function is given the statement by the process_basic_block function:

https://github.com/rust-lang/rust/blob/51b47dc4a1af3260738aa7c5d4e31e8d77c0c0b6/src/librustc_mir/dataflow/mod.rs#L340-L349

By adding debug! statements anywhere in rustc_mir::build that mentions StorageDrop, we can find that it is created in the build_scope_drops function:

https://github.com/rust-lang/rust/blob/f536143ab68c9db352e78269b3ec72b358b18548/src/librustc_mir/build/scope.rs#L906-L914

At the top of this function, we see that the span originates from the scope argument.

https://github.com/rust-lang/rust/blob/f536143ab68c9db352e78269b3ec72b358b18548/src/librustc_mir/build/scope.rs#L835-L846

The caller of this function is pop_scope, where we see the scope being taken from the Builder field self.scopes:

https://github.com/rust-lang/rust/blob/f536143ab68c9db352e78269b3ec72b358b18548/src/librustc_mir/build/scope.rs#L382-L394

If we print the value of the scope variable, we get the following:

DEBUG:<unknown>: DTW 1: Scope { visibility_scope: scope[0], region_scope: Node(ItemLocalId(4)), region_scope_span: /mnt/c/Users/David/Projects/personal/rust/src/test/ui/issue-46472.rs:13:29: 17:2, needs_cleanup: false, drops: [DropData { span: /mnt/c/Users/David/Projects/personal/rust/src/test/ui/issue-46472.rs:17:2: 17:2, location: _3, kind: Storage }, DropData { span: /mnt/c/Users/David/Projects/personal/rust/src/test/ui/issue-46472.rs:17:2: 17:2, location: _2, kind: Storage }, DropData { span: /mnt/c/Users/David/Projects/personal/rust/src/test/ui/issue-46472.rs:17:2: 17:2, location: _1, kind: Storage }], cached_exits: {}, cached_generator_drop: None, cached_unwind: CachedBlock { unwind: None, generator_drop: None } }

We can see the drops field contains the span that we're looking for. That's as far as I've gotten at the moment.

nikomatsakis commented 6 years ago

So @davidtwco did some more investigation and narrowed the problem to the span produced by this line:

https://github.com/rust-lang/rust/blob/0b90e4e8cd068910f604f3e1fb5d03cc01f1658f/src/libsyntax/parse/parser.rs#L4430

On gitter, they wrote:

@nikomatsakis So, we have prev_span in the previous snippet. prev_span is only replaced by the new previous span in two places. These spans come from the TokenCursor::next function, which when poping over the frames from the stack, if it detects a closing brace, will create a new span that covers just this individual token.

In the closing delimiter case, it does this by taking the span from the frame (which in this case seems to be the whole function), and then finds the endpoint (which it considers to be the space after the brace) - it constructs the new span by taking the endpoint (space after function) as hi and then removes what it considers the length of the DelimToken::Brace (which is one) from that (giving us the brace symbol) and sets that as lo. ie. "} "

I think that the length of the DelimToken::Brace being one makes sense, it is one character. I think that for the logic we found with the lo.to(..) to make sense, we'd need to dig a bit deeper and find out where the stack of frames is being created, since the issue would be the closing delim frame ending after the character. We'd then end up with the lo being the character behind the brace (column 0) and the hi value being the brace (column 1) - and then lo.to(..) makes sense.

Now, it is possible that the rationale was that a span starting at column 0 makes no sense (which would happen if we modified the frame), and neither does a span over a character that doesn't actually span any BytePoss (which would happen if we modified the DelimToken length) - therefore the best option is to have the span extend out from the character by one like it does. In which case, changing to lo.until(..) is the best option.

I'm not sure, whatever you think is the best option. I've convinced myself both ways back and forth in writing this. I think that we should go with the lo.until(..) solution, unless there is precedent for spans at column 0 or with the lo and hi being the same BytePos over one character, I don't think that solution makes sense. If you consider the empty space after the brace that we're seeing a newline character, then the way the span is currently does make sense. Or not, I'm confusing myself.

It seems to me that the root problem is actually the span for end brace being "} ", but I thought I would cc @estebank, @nrc, and @petrochenkov to see if any of you has an opinion.

davidtwco commented 6 years ago

After running all the ui tests with the change to lo.until(..), it's pretty clear that this isn't a suitable solution:

---- [ui] ui/suggestions/str-array-assignment.rs stdout ----
        diff of stderr:

2         --> $DIR/str-array-assignment.rs:13:11
3          |
4       13 |   let t = if true { s[..2] } else { s };
+          |           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected str, found &str
-          |           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ expected str, found &str
6          |
7          = note: expected type `str`
8                     found type `&str`

There are lots of test results like this. We'll need to dig a bit deeper and find out what is specifically happening in the cases we're seeing an issue with.

It seems like in some cases, the span is " }" - I believe this is the intended situation. In the issue-46472 example, if I add some logging to the close_tt function where the span is created:

https://github.com/rust-lang/rust/blob/2e0ad5af309c016986e6c81da181762e833c802f/src/libsyntax/tokenstream.rs#L67-L75

Then we find that the variables have the following values:

span=`Span { lo: BytePos(554), hi: BytePos(708), ctxt: #0 }` 
delim.len()=`1`
delim=`Brace`

Given this, I think the issue is that the span variable has hi: BytePos(708) rather than hi: BytePos(707). This function is called by the TokenStream::next function that was mentioned in the previous quoted message:

https://github.com/rust-lang/rust/blob/2e0ad5af309c016986e6c81da181762e833c802f/src/libsyntax/parse/parser.rs#L305-L337

Here we can see that self.frame.span is what is being passed into close_tt and that self.frame is from self.stack.pop.

davidtwco commented 6 years ago

So, a little more progress. In the advance_token function below, start_bytepos is BytePos(707) (the position of the } in the issue-46472 example.

https://github.com/rust-lang/rust/blob/07f51fb8684e1ea0a3859234d360814093d891bd/src/libsyntax/parse/lexer/mod.rs#L309-L328

The call to next_token_inner returns the CloseDelim token we expect and advances self.pos to BytePos(708). This means that the span ends up being Span { lo: BytePos(707), hi: BytePos(708) } - ie. "} ".

If we modify the function to check if self.ch (the next character, also updated by next_token_inner) is whitespace, and if it is make the span Span { lo: BytePos(707), hi: BytePos(707) } then it fixes the issue. This feels like an ugly solution.

error[E0597]: borrowed value does not live long enough (Mir)
  --> /mnt/c/Users/David/Projects/personal/rust/src/test/ui/issue-46472.rs:14:10
   |
14 |     &mut 4
   |          ^ temporary value does not live long enough
...
17 | }
   | - temporary value only lives until here
   |

However, in doing this, we end up modifying the borrowck error to:

error[E0597]: borrowed value does not live long enough (Ast)
  --> /mnt/c/Users/David/Projects/personal/rust/src/test/ui/issue-46472.rs:14:10
   |
14 |       &mut 4
   |            ^ temporary value does not live long enough
15 |       //~^ ERROR borrowed value does not live long enough (Ast) [E0597]
16 |       //~| ERROR borrowed value does not live long enough (Mir) [E0597]
   |  ______________________________________________________________________-
17 | | }
   | |_ temporary value only lives until here
   |

This seems to suggest that this is dealt with somewhere in the borrowck code (or elsewhere) to move the BytePos back a position. Given that, I've looked through the borrowck code and found this:

https://github.com/rust-lang/rust/blob/78f24d86b84882a02c15f27768e831d0342a3f5d/src/librustc_borrowck/borrowck/mod.rs#L1320-L1327

Notice the call to end_point on the span. The rest of the function is similar to the following code in the Mir borrow checker:

https://github.com/rust-lang/rust/blob/51b47dc4a1af3260738aa7c5d4e31e8d77c0c0b6/src/librustc_mir/build/scope.rs#L698-L699

Except, there's no call to end_point! I've submitted #47420 that fixes this issue - I really wish I'd noticed that difference earlier.

estebank commented 6 years ago

@davidtwco your write up on the sleuthing you've had to do is amazingly detailed! Thank you so much for taking the time to figuring this out!