mvdan / sh

A shell parser, formatter, and interpreter with bash support; includes shfmt
https://pkg.go.dev/mvdan.cc/sh/v3
BSD 3-Clause "New" or "Revised" License
7.14k stars 338 forks source link

interp: fix multi-line shell pattern match equality #866

Closed lollipopman closed 2 years ago

lollipopman commented 2 years ago

commit be03afe7282aea8007fd3e961609a3b1f077db1d added the multi-line flag to the regexp generated for == and =! testing. However, this flag is confusing as it allows for a regular expression to match one line out of a multi-line input.

(m) multi-line mode: ^ and $ match begin/end line in addition to
begin/end text (default false)

This means we are now matching against just one line in a multi-line string, rather than the entire text. Instead, to match Bash's behavior, we add the s flag to the regexp:

(s) let . match \n (default false)

With this a * in a shell pattern will match \n.

Glob patterns should always match newline characters, so add the s flag by default when creating a pattern. Alternately we could add the s flag only when the . meta-character is needed, but that seemed more fragile. This behavior worked by accident with file globs because rather than matching with .* we match with [^/]* which does match \n.

lollipopman commented 2 years ago

@ihar-orca I would love a review, since you submitted the original patch

ihar-orca commented 2 years ago

@lollipopman looks really good!

lollipopman commented 2 years ago

Also thinking about the added API - it seems like we do need the added API, because in some cases we want the partial matching to happen. Or perhaps not - if one needs partial matching, perhaps foobar could be fed in as *foobar*, then it will still work as expected.

I had not thought about inverting the need for whole string matches, by converting each pattern into a whole string match as you suggest. I tried doing some rough benchmarks in Go and whole string matches appear significantly slower on larger texts, as I would expect.

func BenchmarkPartialString(b *testing.B) {
    b.ReportAllocs()
    out, err := exec.Command("man", "bash").Output()
    if err != nil {
        log.Fatal(err)
    }
    manBash := string(out)
    partial := regexp.MustCompile("COPYRIGHT")
    for i := 0; i < b.N; i++ {
        if !partial.MatchString(manBash) {
            b.Fatal(err)
        }
    }
}
func BenchmarkWholeString(b *testing.B) {
    b.ReportAllocs()
    out, err := exec.Command("man", "bash").Output()
    if err != nil {
        log.Fatal(err)
    }
    manBash := string(out)
    wholeString := regexp.MustCompile("^(?s).*COPYRIGHT.*$")
    for i := 0; i < b.N; i++ {
        if !wholeString.MatchString(manBash) {
            b.Fatal(err)
        }
    }
}
mvdan commented 2 years ago

Ah, good point - "whole" matching is slower, so having it opt-in makes sense even if most use cases within shell will want to opt into the option.

lollipopman commented 2 years ago

Very nice patch :) Thanks for bearing with me as I review this - it took me a while as I've been travelling.

More than happy to review more patches from you if they are of this quality.

Thanks, hopefully I will find the time to keep them coming!