Closed lollipopman closed 2 years ago
@ihar-orca I would love a review, since you submitted the original patch
@lollipopman looks really good!
Also thinking about the added API - it seems like we do need the added API, because in some cases we want the partial matching to happen. Or perhaps not - if one needs partial matching, perhaps
foobar
could be fed in as*foobar*
, then it will still work as expected.
I had not thought about inverting the need for whole string matches, by converting each pattern into a whole string match as you suggest. I tried doing some rough benchmarks in Go and whole string matches appear significantly slower on larger texts, as I would expect.
func BenchmarkPartialString(b *testing.B) {
b.ReportAllocs()
out, err := exec.Command("man", "bash").Output()
if err != nil {
log.Fatal(err)
}
manBash := string(out)
partial := regexp.MustCompile("COPYRIGHT")
for i := 0; i < b.N; i++ {
if !partial.MatchString(manBash) {
b.Fatal(err)
}
}
}
func BenchmarkWholeString(b *testing.B) {
b.ReportAllocs()
out, err := exec.Command("man", "bash").Output()
if err != nil {
log.Fatal(err)
}
manBash := string(out)
wholeString := regexp.MustCompile("^(?s).*COPYRIGHT.*$")
for i := 0; i < b.N; i++ {
if !wholeString.MatchString(manBash) {
b.Fatal(err)
}
}
}
Ah, good point - "whole" matching is slower, so having it opt-in makes sense even if most use cases within shell will want to opt into the option.
Very nice patch :) Thanks for bearing with me as I review this - it took me a while as I've been travelling.
More than happy to review more patches from you if they are of this quality.
Thanks, hopefully I will find the time to keep them coming!
commit be03afe7282aea8007fd3e961609a3b1f077db1d added the multi-line flag to the regexp generated for
==
and=!
testing. However, this flag is confusing as it allows for a regular expression to match one line out of a multi-line input.This means we are now matching against just one line in a multi-line string, rather than the entire text. Instead, to match Bash's behavior, we add the
s
flag to the regexp:With this a
*
in a shell pattern will match\n
.Glob patterns should always match newline characters, so add the
s
flag by default when creating a pattern. Alternately we could add thes
flag only when the.
meta-character is needed, but that seemed more fragile. This behavior worked by accident with file globs because rather than matching with.*
we match with[^/]*
which does match\n
.