inhabitedtype / angstrom

Parser combinators built for speed and memory efficiency
Other
637 stars 74 forks source link

Unexpected behavior of Alternative (<|>) #226

Open chfont opened 1 year ago

chfont commented 1 year ago

Hello,

While working on parser, I've run into a scenario where <|> results in an error when I run a <|> b, but gives a result when running b <|> a, if b recognizes the string given.

Below is a small example to reproduce this:

type ast = 
    | Var of string 
    | Plus of ast * ast
    | App of ast * ast

open Angstrom;;
let white_space (c:char) = match c with 
    | ' ' | '\t' | '\n' -> true 
    | _ -> false
let alpha (c:char) = match c with
    | 'a'..'z' -> true 
    | 'A'..'Z' -> true 
    | _ -> false
let var_parser = (Angstrom.take_while1 alpha) >>| (function f -> Var f)
let space = Angstrom.skip_while white_space
let expr_parser = Angstrom.fix (function exprp -> 
    let atom = var_parser in
    let plus_parser = Angstrom.fix (function plusp ->
        ((atom <* space) <* (Angstrom.char '+' <* space) >>= (function a -> 
            plusp >>| (function e -> Plus(a,e))
        )) <|> atom
    ) in
    let arith = plus_parser <|> atom in
    let app = (arith <* space) >>= (function e1 -> 
        arith >>| (function e2 -> App (e1,e2))
    ) in
    arith <|> app
)
let parse str =  Angstrom.parse_string ~consume:All expr_parser str  

With the above, parse "x x" fails, but when flipping arith <|> app to app <|> arith, parse "x x" succeeds, rather than both working as expected.

LogicalOverflow commented 1 year ago

The end_of_input error you get in the arith <|> app case isn't caused by the expr_parser not succeeding, but by succeeding without consuming the whole input with ~consume:All. (This is the case, because the arith parser succeeds by consuming only the first xin the input, resulting in <|> succeeding. I.e. the order matters, because both parser succeed.)