micromatch / picomatch

Blazing fast and accurate glob matcher written JavaScript, with no dependencies and full support for standard and extended Bash glob features, including braces, extglobs, POSIX brackets, and regular expressions. Used by GraphQL, Jest, Astro, Snowpack, Storybook, bulma, Serverless, fdir, Netlify, AWS Amplify, Revogrid, rollup, routify, open-wc, imba, ava, docusaurus, fast-glob, globby, chokidar, anymatch, cloudflare/miniflare, pts, and more than 5 million projects! Please follow picomatch's author: https://github.com/jonschlinkert
https://github.com/micromatch
MIT License
971 stars 56 forks source link

** behaves like * inside or expression #104

Open conartist6 opened 2 years ago

conartist6 commented 2 years ago

Expected: pm('(**|x)')('a') is true
pm('(**|x)')('a/b') is true

Actual: pm('(**|x)')('a') is true
pm('(**|x)')('a/b') is false

Most of the discussion of the technical aspects of this issue is in the now-closed #88.

conartist6 commented 2 years ago

In my particular case I was merging multiple glob expressions with `(${globs.join('|')})`. I was able to work around the issue by simply passing the array of patterns to picomatch, but ultimately I think this still a bug that needs to be fixed. I may eventually try to fix it by rewriting the picomatch parser, which seems to have a variety of internal inconsistencies at present.

jonschlinkert commented 2 years ago

I may eventually try to fix it by rewriting the picomatch parser, which seems to have a variety of internal inconsistencies at present.

PR would be welcome, as long as it's passing all unit tests. I think there are several thousand from bash, minimatch, etc.

but ultimately I think this still a bug

There are opportunities to improve some of the matching with extglobs, since negative and positive lookbehinds were not available when I wrote this parser. My recommendation is that you take the patterns from your examples and show what they should look like if they were pure regular expressions. There are many limitations in ES regular expressions. We can't do atomic groups, we can't do proper conditionals, etc. Which makes it more challenging, but also the longer and more complicated the regex, the more branches we have and the more susceptible to catastrophic backtracking.

conartist6 commented 2 years ago

Yep, it's great that there are so many tests, it will really make my life easier if I get into making the changes.

As to regex itself, I don't really see how ** is much different than * from that perspective. I mean, I do, it can potentially consume a lot more stuff before having to backtrack, but I think that's still more or less on the writer of the pattern.

Now I don't know if it is of any interest to you, but I actually wrote the only non-native (i.e. scripted) non-backtracking regex engine currently in the ecosystem: @iter-tools/regex. It's a bit sluggish though as it is scripted and doesn't implement the DFA optimization, which is to say that it may be in more than one state at a time.