Closed SquidDev closed 11 months ago
Thank your really much for looking at that. I think you are right, the capture of the start location is off-by-one. I don't know how to fix that yet: when the derivation discovers that the end of the reduction was reached, it is already too late. I have to take a closer look at that.
The good news is that we should be able to get rid of really_top
in the end (... it is still possible to write patterns that will match an empty input and will fail to capture the initial location, but that's a fringe case).
Current master should fix this problem via two commits:
The initial position should now be given as an argument, just in case. I am not 100% satisfied with the current encoding, I might change the compilation scheme, but the interface should stay the same (unless you have a clever idea to avoid having to pass the initial position :)).
I feel better with 06f01cf78457fccd4a42e9482def93c6b61bb7ee which specializes the representation to distinguish an empty match from matching the initial state. The initial position is still passed as an argument.
Thank you very much for this thoughtful bug report, it helped a lot :).
Note: as noted here, fixing the off-by-one revealed a limitation of the current data-flow analysis that was not able to eliminate redundant stores. The current solution is a workaround, but this is a potential low-hanging fruit for reducing the automaton.
I was taking a look at updating my code to use the latest version of lrgrep (I'm still using a version from February, before #8/#9 were merged). However, as of 0cd455b0e7005ff45e6f429b369f0e675bb4b59e I noticed that lrgrep requires the Menhir parser to export a
val really_top : 'a env -> element
definition.While this function is pretty easy to add, there's a bit of me which wonders whether it's needed/useful in the first place. The only time we need to read the very top of the stack in the OCaml test suite is for
test_0314.ml
:https://github.com/let-def/lrgrep/blob/a6fec512918c94baac7bd85dabc24a496f6b190e/ocaml/testsuite/test_0314.ml#L1
https://github.com/let-def/lrgrep/blob/a6fec512918c94baac7bd85dabc24a496f6b190e/ocaml/parse_errors.mlyl#L81-L84
However, in this case, the capture isn't actually very useful - the start/end position of the top of the stack will the first character in the file, when probably we want to be capturing the position of the initial
LET
(orlet_bindings(_)
). I wonder ifstartloc
should be capturing the position of the first token in the production in this case (and similarly forendloc
) instead?Apologies if this is already on your radar (or if I'm talking nonsense)! I've not dug too much into the code yet, just felt it was worth asking first.