let-def / lrgrep

Menhir polishing toolbox, for experienced druids
25 stars 2 forks source link

Is Menhir's `really_top` really needed? #10

Closed SquidDev closed 11 months ago

SquidDev commented 11 months ago

I was taking a look at updating my code to use the latest version of lrgrep (I'm still using a version from February, before #8/#9 were merged). However, as of 0cd455b0e7005ff45e6f429b369f0e675bb4b59e I noticed that lrgrep requires the Menhir parser to export a val really_top : 'a env -> element definition.

While this function is pretty easy to add, there's a bit of me which wonders whether it's needed/useful in the first place. The only time we need to read the very top of the stack in the OCaml test suite is for test_0314.ml:

https://github.com/let-def/lrgrep/blob/a6fec512918c94baac7bd85dabc24a496f6b190e/ocaml/testsuite/test_0314.ml#L1

https://github.com/let-def/lrgrep/blob/a6fec512918c94baac7bd85dabc24a496f6b190e/ocaml/parse_errors.mlyl#L81-L84

However, in this case, the capture isn't actually very useful - the start/end position of the top of the stack will the first character in the file, when probably we want to be capturing the position of the initial LET (or let_bindings(_)). I wonder if startloc should be capturing the position of the first token in the production in this case (and similarly for endloc) instead?

Apologies if this is already on your radar (or if I'm talking nonsense)! I've not dug too much into the code yet, just felt it was worth asking first.

let-def commented 11 months ago

Thank your really much for looking at that. I think you are right, the capture of the start location is off-by-one. I don't know how to fix that yet: when the derivation discovers that the end of the reduction was reached, it is already too late. I have to take a closer look at that.

The good news is that we should be able to get rid of really_top in the end (... it is still possible to write patterns that will match an empty input and will fail to capture the initial location, but that's a fringe case).

let-def commented 11 months ago

Current master should fix this problem via two commits:

The initial position should now be given as an argument, just in case. I am not 100% satisfied with the current encoding, I might change the compilation scheme, but the interface should stay the same (unless you have a clever idea to avoid having to pass the initial position :)).

let-def commented 11 months ago

I feel better with 06f01cf78457fccd4a42e9482def93c6b61bb7ee which specializes the representation to distinguish an empty match from matching the initial state. The initial position is still passed as an argument.

Thank you very much for this thoughtful bug report, it helped a lot :).

Note: as noted here, fixing the off-by-one revealed a limitation of the current data-flow analysis that was not able to eliminate redundant stores. The current solution is a workaround, but this is a potential low-hanging fruit for reducing the automaton.