Adjust message range on unexpected token error

Kha commented 9 months ago

This PR proposes a quite simple and general solution to #1971: when the parser hits an unexpected token, it will include all preceding whitespace in the error message range as the expected token could be inserted at any of these places to fix the error. I'll let the test file speak for itself:

Implementation notes: In order to implement this strategy, I first had to teach the parser to report ranges at all - so far we only reported single positions and left it to the editor to including what it thinks is the next token, which is not ideal and completely breaks when we want to include preceding whitespace like this. For the purposes of this PR I only changed the token error messages to report ranges, which they do via a new helper function mkUnexpectedTokenError. True to its name, I also used this refactoring opportunity to actually report what the unexpected token is; see changed test outputs below. Because of this preparatory work, I recommend looking at the changes commit by commit.

Kha commented 9 months ago

!bench

Kha commented 9 months ago

!bench

leanprover-bot commented 9 months ago

Here are the benchmark results for commit 3def0dbbb9499c16b0d6f08daf733ab4acb49ae6. There were significant changes against commit bb738796ae13e1fff6dec921b1a26dae3a040498:

  Benchmark   Metric             Change
  ===============================================
- parser      branches             6.6% (610.4 σ)
- parser      instructions         7.1% (885.0 σ)
+ stdlib      tactic execution    -1.6% (-16.8 σ)

Kha commented 9 months ago

Interesting, that's a little more overhead than I would have assumed

leodemoura commented 9 months ago

Interesting, that's a little more overhead than I would have assumed

I agree. Could you please investigate?

Kha commented 9 months ago

!bench

Kha commented 9 months ago

!bench

leanprover-bot commented 9 months ago

Here are the benchmark results for commit 379106fc6d70be4fe3e45f895ed3700f62b3abe3. There were significant changes against commit d4be21b5598f6eb3c0ffa82a8827d3f9e8472439:

  Benchmark                  Metric                  Change
  ====================================================================
- parser                     branches                  4.5% (1946.6 σ)
- parser                     instructions              5.0% (1421.5 σ)
+ stdlib                     attribute application   -14.7%  (-20.9 σ)
+ stdlib                     tactic execution        -15.0% (-140.2 σ)
+ stdlib                     task-clock              -13.5% (-394.6 σ)
+ stdlib                     type checking           -14.7%  (-50.3 σ)
+ stdlib                     wall-clock              -14.1%  (-33.0 σ)
+ tests/bench/ interpreted   task-clock               -9.4%  (-18.7 σ)
+ tests/bench/ interpreted   wall-clock              -13.6%  (-18.3 σ)

Kha commented 9 months ago

!bench

leanprover-bot commented 9 months ago

Here are the benchmark results for commit 379106fc6d70be4fe3e45f895ed3700f62b3abe3. There were significant changes against commit d4be21b5598f6eb3c0ffa82a8827d3f9e8472439:

  Benchmark   Metric         Change
  ============================================
- parser      branches         4.5% (1947.0 σ)
- parser      instructions     5.0% (1421.8 σ)

Kha commented 9 months ago

!bench

leanprover-bot commented 9 months ago

Here are the benchmark results for commit 6e44c129893ca7e598f48b52d574f4770363b5ed. There were significant changes against commit d4be21b5598f6eb3c0ffa82a8827d3f9e8472439:

  Benchmark   Metric         Change
  ============================================
- parser      branches         3.2% (1452.2 σ)
- parser      instructions     3.8% (1157.7 σ)

Kha commented 9 months ago

I don't think I will get closer than that without addressing the underlying problem that we seem to backtrack entirely too often from token mismatches, apparently from code like "tk" ... <|> ...

Kha commented 9 months ago

I believe this is ready to merge but as it is a subjective change, it would be good to hear about some feedback: can the original problem be considered solved with this? Is the space before def f4 consistent or "weird"? Please give it a try e.g. via the PR release when it is released.

leanprover-community-mathlib4-bot commented 9 months ago

✅ Mathlib branch lean-pr-testing-2393 has successfully built against this PR. (2023-09-05 09:42:31)

kbuzzard commented 9 months ago

My instinct is that this change makes things less confusing for beginners.

Kha commented 8 months ago

Let's give it a try in practice

leanprover / lean4

Adjust message range on unexpected token error #2393