lukewilliamboswell / roc-parser

A simple Parser for Roc
https://lukewilliamboswell.github.io/roc-parser/
Universal Permissive License v1.0
29 stars 11 forks source link

How to match arbitrary string? #4

Closed ryanb closed 11 months ago

ryanb commented 11 months ago

Is it possible to match an arbitrary string in a non greedy way? I tried anyString however it appears to be greedy. Here's some code.

app "label-parser"
    packages {
        pf: "https://github.com/roc-lang/basic-cli/releases/download/0.5.0/Cufzl36_SnJ4QbOoEmiJ5dIpUxBvdB3NEySvuH82Wio.tar.br",
        parser: "https://github.com/lukewilliamboswell/roc-parser/releases/download/0.3.0/-e3ebWWmlFPfe9fYrr2z1urfslzygbtQQsl69iH1qzQ.tar.br",
    }
    imports [
        pf.Stdout,
        parser.String.{ parseStr, string, anyString },
        parser.Core.{ apply, const, skip },
    ]
    provides [main] to pf

main =
    Stdout.line "run roc test"

labelParser =
    const \label -> label
    |> apply anyString
    |> skip (string ":")

expect
    result = parseStr labelParser "foo:"
    result == Ok "foo"

It results in the error "expected string : but found ``." so I assume it's greedy.

I also tried apply (chompUntil ':') but I don't think it generates what I want.

Thanks for this awesome library by the way. Let me know if there's a better place to put questions like this.

lukewilliamboswell commented 11 months ago

I think chompUntil will work, it just returns a List U8 instead of a string, because it is operating on raw utf-8 bytes and chomping in a very primitive way.

In Roc a Str is guaranteed to always be valid utf-8, and we know that utf-8 is a variable width encoding.

So, if you map the returned value with Str.fromUtf8 and then maybe also Result.withDefault if you are sure you are working with ASCII which is encoded as a single utf-8 byte, this will get you back to a Str.

In future once roc-lang/unicode gets a bit more mature we should be ok to write a parser that operates on unicode CodePoints which are U32 and then it should be possible to implement anyString I think, but it will probably do strange things with emoji and flags.

ryanb commented 11 months ago

@lukewilliamboswell thanks for the detailed answer. Sounds like chompUntil will work for this. Would you consider adding anyStringUntil which does the conversion back to string for convenience? Alternatively maybe add a string conversion example to the docs of chompUntil?

lukewilliamboswell commented 11 months ago

I think an example and some explanation for the documentation would be best. Do you have a working example I can use?

Would you be interested in contributing a PR with this addition to the docs?

Thank you for raising this issue. 😀

ryanb commented 11 months ago

@lukewilliamboswell sounds good, I'll try to submit a PR when I get some time.