winnow-rs / winnow

Making parsing a breeze
https://docs.rs/winnow
Other
477 stars 36 forks source link

Opportunistic zero copy parsing for escaped_transform using `Cow` #543

Open VorpalBlade opened 2 weeks ago

VorpalBlade commented 2 weeks ago

Please complete the following tasks

winnow version

0.6.13

Describe your use case

Something like this works:

/// Unquoted string value
fn unquoted_string_with_escapes(i: &mut &str) -> PResult<String> {
    escaped_transform(take_till(1.., [' ', '\t', '\n', '\r', '\\']), '\\', escapes).parse_next(i)
}

fn escapes<'input>(i: &mut &'input str) -> PResult<&'input str> {
    alt((
        "n".value("\n"),
        "r".value("\r"),
        "t".value("\t"),
        " ".value(" "),
        "\"".value("\""),
        "\\".value("\\"),
    ))
    .parse_next(i)
}

But this doesn't compile:

fn unquoted_string_with_escapes<'input>(i: &mut &'input str) -> PResult<Cow<'input, str>> {
    escaped_transform(take_till(1.., [' ', '\t', '\n', '\r', '\\']), '\\', escapes).parse_next(i)
}

fn escapes<'input>(i: &mut &'input str) -> PResult<&'input str> {
    alt((
        "n".value("\n"),
        "r".value("\r"),
        "t".value("\t"),
        " ".value(" "),
        "\"".value("\""),
        "\\".value("\\"),
    ))
    .parse_next(i)
}

Describe the solution you'd like

It would be nice if escaped_transform could have zero copy parsing when the string happens to contain no escapes (which is the common case for my use case).

Alternatives, if applicable

I could try to write my own parser from scratch instead of using escaped_transform and make it support this. I would rather not and also it would be nice to have something that is reusable and available for others too.

Additional Context

No response

epage commented 1 week ago

This would require having more knowledge of the underlying types than is currently supported. At minimum its a breaking change. I suspect this would require a new trait which I'm hesitant to add for this one specific parser.

My own biases also lean towards not addressing this. When I'm prototyping, I use whatever gets the job done. If I'm writing a parser for a grammar, I implement everything myself so there is a clear connection between the grammar and my code (example). This makes me see escaped_transform as more of a tool for prototyping than one for performant production code.

VorpalBlade commented 4 days ago

So I had a bit of a think about your response. And I don't think I agree. To me it seems that functionality should be as fast as it can possible be. You shouldn't have to reinvent the wheel to be fast in Rust.

Unfortunately that would (if it needs to know the underlying type more precisely) currently require specialisation as I understand it. That would indeed be a blocker for making escaped_transform do this. Otherwise it could perhaps be a separate function escaped_transform_str?

epage commented 4 days ago

Yes, in general winnow aims for speed. However, it isn't our highest priority. For example, we likely could be made even faster with GATs, like chumsky and nom v8 use, but they run counter to more important principles, like being a toolbox instead of a framework and having the built-in parsers be written in a way similar to how users would.

We have a general implementation of escaped_transform. It isn't the fastest for all cases but it supports a large variety of cases. That leaves us with

For me, it feels too specialized to have a second implementation (and, in general, I've been trying to avoid that).

I also feel that the current version has the right ergonomics for most users, so I would prefer not to narrow supported cases.

I'm assuming people won't want it removed.

That leaves "do nothing".