tokay-lang / tokay

Tokay is a programming language designed for ad-hoc parsing, inspired by awk.
https://tokay.dev
MIT License
236 stars 7 forks source link

Bug with `Repeat<P>`: Match in `Pos<P>` sequence fails without severity #123

Closed phorward closed 9 months ago

phorward commented 9 months ago

This is somehow complicated to describe, and only happens with #105 which becomes merged soon.

# cargo run -- fufu.tok -- "#"      # doesn't work
# cargo run -- fufu.tok -- "#lala"  # works!

Test : @{ '#' Char<^\n>* }
# Test : @{ '#' }  # works, because it's not a sequence

Test+ print("Never :-(")

Test+ (becoming Pos<Test>, which becomes Repeat<Test, min:1, max:void, blur:true> only is matched when additional chars are provided. This is, because the '#'-match will become void as simple matches have severity 0. Therefore, Repeat<P> fails at the position marked:

Repeat : @<
    P,          # Parselet
    min: 1,     # minimum occurence
    max: void,  # maximum occurence, void for unlimited
    blur: true  # result blurrying; empty list becomes void, one-item list becomes item
> {
    res = ()

    loop {
        P {
            res.push($1)  # <<<--- pushes void, which doesn't push
            if max && res.len == max break
        }

        if res.len < min reject   # <<<--- success, because res.len is 0, as no void is pushed
        break
    }

    if blur {
        if res.len == 0 {
            accept void
        }
        else if res.len == 1 {
            accept res[0]
        }
    }

    res
}

The problem at this point is: Input is consumed, but Repeat relies on the res-list. An additional variable could solve the problem, but is this the solution? Repeat must be implemented as optimal as possible.