vektah / goparsify

golang parser-combinator library
MIT License
73 stars 16 forks source link

Predictor within `Any` breaks if one parser is a substring of another #2

Closed duncanmorris closed 6 years ago

duncanmorris commented 6 years ago

The predictor feature of the Any function breaks if one parser is a substring of another. I've got code to produce a minimal version of the issue.

The parsing always works if the most specific (longest) parser is matched. If the least specific (shortest) parser matches, the predictor returns this template the next time it is called which means Any fails to match the more specific parser. We call Any with the most specific parsers first leading to the least specific parser, but the predictor means the order isn't always enforced.

We have solved this by creating a function called NonPredictiveAny that removes the predictive parts of the function. This comes at the expense of not being able to return the longest error (since Error.pos isn't exported). I thought you might have a better solution and/or might want to add an equivalent of NonPredictiveAny for others who have the same issue.

Code to reproduce.

package main

import (
    "os"
    "strings"

    parsify "github.com/vektah/goparsify"
)

func makeTemplateParser() parsify.Parser {
    var template parsify.Parser

    parsify.EnableLogging(os.Stdout)

    a := parsify.Exact(`a`).Map(func(n *parsify.Result) {
        n.Result = `a`
    })
    ab := parsify.Exact(`ab`).Map(func(n *parsify.Result) {
        n.Result = `ab`
    })

    aab := parsify.Any(ab, a)

    template = parsify.Many(aab).Map(func(node *parsify.Result) {
        var xs []string
        for _, child := range node.Child {
            xs = append(xs, child.Result.(string))
        }
        node.Result = strings.Join(xs, ", ")
    })

    return template

}

func main() {}

And a test suite.

package main

import (
    "testing"

    "github.com/stretchr/testify/assert"
    parsify "github.com/vektah/goparsify"
)

func TestRender1(t *testing.T) {
    input := `a`

    var template = makeTemplateParser()
    rendered, err := parsify.Run(template, input)

    assert.Nil(t, err)
    assert.Equal(t, "a", rendered)

}

func TestRender2(t *testing.T) {
    input := `ab`
    var template = makeTemplateParser()
    rendered, err := parsify.Run(template, input)
    assert.Nil(t, err)
    assert.Equal(t, "ab", rendered)

}

func TestRender3(t *testing.T) {
    input := `ab a`
    var template = makeTemplateParser()
    rendered, err := parsify.Run(template, input)

    assert.Nil(t, err)
    assert.Equal(t, "ab, a", rendered)

}

func TestRender4(t *testing.T) {
    input := `a ab`
    var template = makeTemplateParser()
    rendered, err := parsify.Run(template, input)

    assert.Nil(t, err)
    assert.Equal(t, "a, ab", rendered)

}
vektah commented 6 years ago

I think to do this properly the parsers would need to statically expose the first chars they might accept so collisions could be detected and not put in the predictor. It would also mean the predictor table could be built during setup, rather then dynamically at runtime.

I don't really have time to dig deep on this, and this is definitely going to be a confusing bug for anyone hitting it so I've dropped the predictor from the Any parser. It wasn't shaving that much time off anyway.