yhirose / go-peg

Yet another PEG (Parsing Expression Grammars) parser generator for Go
MIT License
63 stars 8 forks source link

Returned positions at syntax errors #12

Open refaktor opened 2 years ago

refaktor commented 2 years ago

Hi, thank you for writing this peg-parser. I am using it in my programming language (https://github.com/refaktor/rye).

I have one problem now with column and line returned on syntax error. If needed I will create a concrete minimal Go example outside of language which would demo just this behavior, but I decided to ask first if it's a known behaviour. The language has many peg rules... but the problem can be shown by just blocks and words. If I show related rules

    BLOCK       <-  "{" SPACES SERIES* "}"
    SERIES      <-  (WORD / BLOCK) SPACES
    WORD        <-  LETTER LETTERORNUM*

The behaviour is this

; below is a block of words -- parses OK
{ abc cde fgh }
; if I make syntax error at first word
{ ab#c def ijk } ; I get correct position of the first word 1:3
{ abc d#ef ijk } ; I get correct position of the second word 1:7
{ abc def# ijk } ; I get the positions of the token, not character, again 1:7

If I have subblocks this becomes a problem, as I always get postion of the first sub-block

{ abc { def# ijk } } ; I get 1:7
{ abc { def ij#k } } ; I get 1:7
{ abc { def ijk lmn { op#r } } } ; I still get 1:7, also in multiline context

Is there any way to get more exact postion of the error, or can I try to implement that, because I would really need it ...

Thank you for your work so far!

yhirose commented 2 years ago

@refaktor, thank you for the report. In fact, I stopped developing the Golang version of peg library, and it's been over 2 years since I made the last significant change... The go-peg is now lacking lots of features that cpp-peglib has, and one of them is a much better built-in error position handing and a customizable error reporting feature. Here is the output from cpp-peglib playground with your grammar. I think the following is what you were expecting, right?

image

In order to see the same results, the go-peg needs to have the same changes in cpp-peglib, and it requires lots of time and energy... I just wonder if you could use cpp-peglib with cgo in your project. (The cpp-peglib playground is doing the similar. I wrapped the cpp-peglib parser and made a WASM module, then run it on the web site.)

Sorry that I can't give you a real solution with go-peg. Hope this comment can help you. :)

refaktor commented 2 years ago

@yhirose thank you for a very quick reply. I understand you development moved to c++. I don't like the idea of adding a cgo dependency right into the core of language. In that case I would rather try to find some other parser, or if I can't write a manual one. The first thought is that I could try to improve go-peg to work as I want.

Do you think just adding better position response would require major restructuring of the code or just updates at the right places?

yhirose commented 2 years ago

@refaktor, adding the better position response logic may not be very hard to do. Here are two commits for this feature in cpp-httplib.

https://github.com/yhirose/cpp-peglib/commit/50aaba73a3d36430fdb0768cb79ae3e983689e53 https://github.com/yhirose/cpp-peglib/commit/befdd2707575c1d0fc82e2b4a41bdfb99df424de

But, I am not really sure of how much work would be involved to apply the changes to go-peg since the codebase of go-peg is now very different from cpp-httplib...

One caution is that go-peg is lacking lots of improvements (performance and error recovering) and bug fixes (including some critical problems like causing infinite loop) made in cpp-httplib. So if you are making a production level tool, I still recommend to use cgo+cpp-httplib instead of go-peg, or an other parser combinator library, or writing your own parser which is always the best choice. :)