mbutterick / brag

Racket DSL for generating parsers from BNF grammars [moved to https://git.matthewbutterick.com/mbutterick/brag]
https://git.matthewbutterick.com/mbutterick/brag
MIT License
61 stars 12 forks source link

A choice pattern seems to throw off the parser if followed later by a zero-or-more quantified pattern #18

Closed bkovitz closed 5 years ago

bkovitz commented 5 years ago

In the grammar below, once a THING containing the case marked by the comment has matched, the grammar returns an error on the next THING.

#lang brag

start : elem*

elem : THING "(" arg ("," arg)* ")"

arg : IDENTIFIER
    | IDENTIFIER ":" IDENTIFIER   ; Matching this case seems to throw
                                  ; everything off, but only if the
                                  ; ("," arg)* clause is included in elem.

I can't figure out how to attach a file in GitHub, but you can reproduce the error by pasting the text above into a file called bug.brag and the text below into a file called bug.rkt and running the latter.

#lang debug racket

(require brag/support br-parser-tools/lex
         (only-in "bug.brag" parse))

(define (tokenize ip)
  (port-count-lines! ip)
  (define my-lexer
    (lexer-src-pos
      [(char-set "(),:") lexeme]  ; single-character tokens
      ["thing" 'THING]
      [(:+ alphabetic) (token 'IDENTIFIER (string->symbol lexeme))]
      [whitespace (return-without-pos (my-lexer ip))]))
  (λ () #R (my-lexer ip)))

(define (p str)
  (parse (tokenize (open-input-string str))))

(define t0 "thing(a)")                      ; This works.
(define t1 "thing(a) thing(b)")             ; This works.
(define t2 "thing(a : Integer)")            ; This works.
(define t3 "thing(a) thing(b : Integer)")   ; This works.
(define t4 "thing(a : Integer) thing(b)")   ; This doesn't work. The parser
                                            ; says that the second "thing" is
                                            ; an error.
(p t0)
(p t1)
(p t2)
(p t3)
(p t4)
bkovitz commented 5 years ago

Here's a workaround:

#lang brag

start : elem*

elem : THING "(" arg ")"
     | THING "(" arg ("," arg)+ ")"   ; In lieu of * (i.e. zero-or-more)

arg : IDENTIFIER
    | IDENTIFIER ":" IDENTIFIER
mbutterick commented 5 years ago

Does this grammar fix the problem for you?

#lang brag

start : elem*

elem : THING "(" arg ("," arg)* ")"

arg : IDENTIFIER [":" IDENTIFIER]
mbutterick commented 5 years ago

(If so it doesn’t negate the possibility of a bug, but I am interested in collecting information about its behavior)

bkovitz commented 5 years ago

Nope, same error.

mbutterick commented 5 years ago

That’s strange, because it does fix the parse error for me. In any case, I can reproduce the original error (and have simplified it further). Also, I have reproduced it in the original ragg package that brag is based on, so it may take a little excavation to sort out.

mbutterick commented 5 years ago

Making a note of the minimal error case:

#lang br

@module/lang[parser]{
#lang brag
foo : ( (X | X Y) A* )*
}

(require 'parser)

(parse (list "X" "Y" "X"))
bkovitz commented 5 years ago

I just tried your alternate grammar with arg : IDENTIFIER [":" IDENTIFIER] again and it does work. Sorry, I must have done something wrong when I tried it the first time.

mbutterick commented 5 years ago

I changed the way the * quantifier works, which I believe fixes the problem.