sebastienros / parlot

Fast and lightweight parser creation tools
BSD 3-Clause "New" or "Revised" License
398 stars 44 forks source link

Questions about potentially migrating from Sprache. #71

Open ashscodes opened 2 years ago

ashscodes commented 2 years ago

I have a project that I am considering moving from Sprache to Parlot, but even after looking through the available parser combinators there are a few instances where similar functionality does not exist, or I don't understand how to use the available parsers to achieve the same thing that I would do in Sprache.

Any Character Except This Set

With Sprache I can currently do something similar to this:

public static Parser<char> BraceClose = Parse.Char('}');
public static Parser<string> BraceCloseDouble = Parse.String('}}');
public static Parser<char> BracketClose = Parse.Char(']');
public static Parser<string> BracketCloseDouble = Parse.String(']]');

public static Parser<string> SomeString = Parse.AnyChar.Except(BraceClose)
                                                       .Except(BraceCloseDouble)
                                                       .Except(BracketClose)
                                                       .Except(BracketCloseDouble)
                                                       .Many()
                                                       .Text();

I have tried looking at Parsers.AnyCharBefore() to replace Parse.AnyChar however it does not seem possible, to chain the Parser<char> and Parser<string> instances as I can with using Except() and Many() in Sprache.

I have not seen an option here for converting a Parser<char> to a Parser<IEnumerable<char>> (Many() or Once()) and then to Parser<string> (Text()).

Regex

I noticed there is an option for Literals.Pattern(), but no visible way of creating a Parser from a Regex. In Sprache I have some parsers that look something like this:

Regex someRegex1 = new Regex("my first regex");
Regex someRegex2 = new Regex("my second regex");
public static Parser<string> SomeRegex = Parse.Regex(someRegex1).Or(Parse.Regex(someRegex2)).Text().Token()

Is it possible to also use Regexes in Parsers in Parlot?

Returning A Default Value If Parser Is Optional

In Sprache there is the GetOrDefault() that can be used on the result of Optional(). Does TryParse() do the same thing or would it need to be accounted for in Then(). My Sprache basic example would look something like this:

public static Parser<MyComment> Comment =
(
    from open in Parser.Char('#')
    from text in Parse.AnyChar.Except(Parser.Char((char)10))
                              .Many()
                              .Text()
                              .Optional()
    select new MyComment(text.GetOrDefault())
);

Check The Result Of A Parser

Is there any way of matching the parser, but not consuming the result? There is a delegate in Sprache that takes an input, but does not consume the matched result, as I understand it. I have used it like this:

public static Parser<T> Verify<T>(this Parser<T> parser)
{
    return p =>
    {
        IResult<T> result = parser(p);
        return result.WasSuccessful ? Result.Success(result.Value, p) : Result.Failure<T>(p, "Verification failed", result.Expectations);
    }
}

OneOrMany & ZeroOrMany

No question here, but it would be cool if Many could be numerically limited, or introduce two new options that could do that, like OneOrSuccessive()/ZeroOrSuccessive(). Sprache has a Repeat(int) option that I have used a few times through a couple of projects.

Apologies for all the questions. It is highly likely that I have read through the parser combinators misunderstood an equivalent or been unsure of how to apply it. Hopefully other people looking to migrate from Sprache might find this useful too? Thanks.

sebastienros commented 2 years ago

Except

Have you tried using Not? Like `Not(BraceCloseDouble.Or(BraceClose).Or(BracketCloseDouble).Or(BraketClose))

There might also be a need for AnyChar, though you can use Terms.Pattern or NonWhiteSpace.

Regex

Isn't it sad that you are using Regexes ;) But sure it could definitely have a new parser with regexes. Might be an issue though later on if we decide to handle streams instead of char buffers. May I suggest instead that you use a parser to find the delimiters, then use When to ensure the result matches the regex?

Default Value

Apparently I already filed an issue for this, with a mitigation https://github.com/sebastienros/parlot/issues/59

Check

I believe When should work, since you can return false to discard the parsed value. https://github.com/sebastienros/parlot/blob/main/test/Parlot.Tests/FluentTests.cs#L13

Many

These could use an option max argument to limit the results, makes sense. You can file an issue and also implement it if you'd like.

ashscodes commented 2 years ago

First of all, thanks for the quick response. I have a lot of things to go away and consider/work through))

Except

Have you tried using Not? Like `Not(BraceCloseDouble.Or(BraceClose).Or(BracketCloseDouble).Or(BraketClose))

There might also be a need for AnyChar, though you can use Terms.Pattern or NonWhiteSpace.

As far as Or is concerned this works fine if Parser<T> is consistent, but would mean I need to switch them all to Parser<string>.

I think AnyChar would be useful as I need to include whitespace too, so something like Literals.Pattern(x => x <= Char.MaxValue) covers it in a roundabout way, but maybe it should be included.

Regarding Not() is there any design reason why this could not be an extension method. I think this could be useful for other methods too. I think my example above could work like this, unless I am mistaken:

var braceClose = Terms.Text("}");
var bracketClose = Terms.Text("]");
var doubleBraceClose = Terms.Text("}}");
var doubleBracketClose = Terms.Text("]]");

Literals.Pattern(x => x <= Char.MaxValue).Not(doubleBraceClose.Or(doubleBracketClose).Or(braceClose).Or(bracketClose));

//or with an added AnyChar.

Literals.AnyChar.Not(doubleBraceClose.Or(doubleBracketClose).Or(braceClose).Or(bracketClose));

Perhaps it would look better if when starting with Not(), it was implicit in subsequent Or() calls.

Literals.AnyChar.Not(doubleBraceClose).Or(doubleBracketClose).Or(braceClose).Or(bracketClose);

Regex

Isn't it sad that you are using Regexes ;) But sure it could definitely have a new parser with regexes. Might be an issue though later on if we decide to handle streams instead of char buffers. May I suggest instead that you use a parser to find the delimiters, then use When to ensure the result matches the regex?

Check

I believe When should work, since you can return false to discard the parsed value. https://github.com/sebastienros/parlot/blob/main/test/Parlot.Tests/FluentTests.cs#L13

We all have our burdens to bear... I'll take a look at When() and see what I can do. I believe the main patterns to cover at the moment are a combination of [0-9A-Za-z_] with or without enclosing parenthesis, the other is one that recognises numbers which can be integers, decimal, hexadecimal, exponential, etc, which I think will be more difficult without a Regex.

Would a potential built-in boolean value parser be considered? I guess it is easy enough to add for anyone.

Terms.Text("true").Or(Terms.Text("false"));

I'll look at using When for the check. Thanks for the example.

Default Value

Apparently I already filed an issue for this, with a mitigation #59

Many

These could use an option max argument to limit the results, makes sense. You can file an issue and also implement it if you'd like.

I will log an issue for a max argument and I will at least take a look at the source code before deciding that what is going on here is far too smart for me. :)

sebastienros commented 2 years ago

I don't like the implicit Not, I want to keep using parenthesis to express the boundaries of the parser. Otherwise there can be too many ambiguities.

I think AnyChar would be a good addition, and simple to implement. Should it also take an option length and return a TextSpan? Or let it be combined with OneOrMany(length)?

It can already read integers and decimal, I filed an issue for hexadecimal, and exponentials might be good to add to the existing ones as an option.

ashscodes commented 2 years ago

I still think it would be nice if Not and some other methods were available as extension methods. Totally understand about the parenthesis setting a boundary. Perhaps it would benefit from a second constructor similar to OneOf.

Parser<T> Not<T>(params Parser<T>[] parsers)

I think AnyChar.OneOrMany(length) is fine and makes sense. One thing that might be a good option for AnyChar is IEnumerable<char> exclude.

A general "number" value (for want of a better name) that covered integer, decimal, hexadecimal and exponential values would be great.