micromatch / picomatch

Blazing fast and accurate glob matcher written JavaScript, with no dependencies and full support for standard and extended Bash glob features, including braces, extglobs, POSIX brackets, and regular expressions. Used by GraphQL, Jest, Astro, Snowpack, Storybook, bulma, Serverless, fdir, Netlify, AWS Amplify, Revogrid, rollup, routify, open-wc, imba, ava, docusaurus, fast-glob, globby, chokidar, anymatch, cloudflare/miniflare, pts, and more than 5 million projects! Please follow picomatch's author: https://github.com/jonschlinkert
https://github.com/micromatch
MIT License
971 stars 56 forks source link

isMatch and compileRe don't seem to agree with each other #117

Closed fabiospampinato closed 9 months ago

fabiospampinato commented 1 year ago

I'd expect isMatch, and the regex created with compileRe, to always give the same output for the same inputs, but that doesn't seem the case:

require('picomatch').isMatch('\\*','\\*') // => true
require('picomatch').compileRe(require('picomatch').parse('\\*')).test ( '\\*' ) // => false

Is it a bug?

jonschlinkert commented 9 months ago

IMHO this is a feature.

What's your opinion? Do you think \\* should match \\*?

fabiospampinato commented 9 months ago

Do you think \\* should match \\*?

I'm kind of agnostic to that with regards to this issue, the bug to me is that isMatch and compileRe basically don't agree on what the answer to that should be.

jonschlinkert commented 9 months ago

the bug to me is that isMatch and compileRe basically don't agree on what the answer to that should be.

compileRe is for compiling a regex, isMatch is not. The isMatch method uses compileRe but also does things that regex can't do. The point of compileRe is that you can create a raw regex to do any kind of matching you want. If you think its a bug that some functions do more than what a regex is capable of doing, I'm not sure we can resolve that.

fabiospampinato commented 9 months ago

Could you expand on what isMatch can do with arbitrary code here when trying to match \\* against \\* that the regex cannot somehow replicate? Because what comes to mind is that any scenario like this can be special-cased inside the regex, so I see zero technical issues why this specific scenario should lead to contrasting results between using isMatch and the regex.

Either way this looks like a bug to me, but whichever way this issue is resolved it won't actually impact me at all, so feel free to close the issue if you feel like the current behavior is correct.

jonschlinkert commented 9 months ago

any scenario like this can be special-cased inside the regex

Is \\* a real file path? If so, I'm curious what file system you're using.

  1. Cases where the glob pattern is a literal match for the input string are exceedingly rare. The only scenarios like this would be when trying to match a file path that is, itself, a glob pattern.
  2. Overloading every regular expression with a literal condition would make picomatch less performant, especially when you can do the following instead:
const pattern = '\\*';
const regex = compileRe(pattern);
const isMatch = value => value === pattern || regex.test(value);

Closing since behavior is correct.

fabiospampinato commented 9 months ago

\\* is not a real path, it's just a valid input that made me notice how two ways of matching a glob that I assumed to be equal (a lot of glob libraries seem to internally convert globs to regexes and then just matching those, so their equivalent isMatch/compileRe pairs never disagree with each other) are not in fact equal.

imo the actual question behind this issue is more like "should there ever be a case where those ways to match a glob disagree with each other?" to which my instinctive answer would be "no", as for example I don't see the reason why the value and the pattern should ever be checked for equality.

jonschlinkert commented 9 months ago

\\* is not a real path

Correct, that's why the regex doesn't match the pattern. That is the specific exception that we're focused on here. Globs are for matching file paths in every spec/standard lib I'm aware of (bash, glob, extglob, wildmat, kleene, gitignore, etc)

a lot of glob libraries

Which ones? Last time I checked, minimatch failed something like 30% of the picomatch unit tests, and no other library was even close to that. Also, minimatch has a makeRe method. picomatch has both makeRe and compileRe, and they're not the same.

should there ever be a case where those ways to match a glob disagree with each other?

Not to be pedantic, but again we're not matching globs, we're matching file paths.

so their equivalent isMatch/compileRe pairs never disagree with each other) are not in fact equal

If competing libraries have multiple methods that do the same exact thing, that seems more like a red flag than a gold standard.

Is it reasonable to expect two functions to have identical behavior?

Even in JavaScript itself if you pass the same regex to .match and .exec you won't always get the same result, or even the same number of results, even if the flags are identical.

fabiospampinato commented 9 months ago

a lot of glob libraries

Which ones?

That I can remember I think globrex and ignore use this approach, more or less. Maybe "a lot" is imprecise here, but converting a glob to a regex is something that is done, like it's not a rare/absurd/weird approach, as far as I've seen, of course it's not the only approach. Picomatch itself seems to have this capability, but as far as I can tell I guess the ability to produce a regex is not leveraged internally to match against a glob necessarily, as I would have assumed otherwise without looking at the code.

should there ever be a case where those ways to match a glob disagree with each other?

Not to be pedantic, but again we're not matching globs, we're matching file paths.

Right. I guess the more precise questions I have are, assuming that no particular normalization step happens to change our input:

  1. should multiple ways that a library supports of matching a path against a glob agree with each other? To which the answer seems yes for me, but maybe I'm missing something or I'm making some other implicit assumption here.
  2. should they disagree if the input string is not actually a "valid" path? If so it'd be interesting to understand why that should be the case, because I would still expect them to agree with each other.

If competing libraries have multiple methods that do the same exact thing, that seems more like a red flag than a gold standard.

I guess it depends on what we mean by "same exact thing" exactly, like I didn't mean to say that for example a library should provide compileToRegExp and compileToRegExp2, and that both should point to the same function, because I don't see the point of that. What I meant is that providing the user with both a function for matching a glob "easily", and another function for compiling a glob to a regex, seems potentially desirable, and I guess I was making the implicit assumption that isMatch function wouldn't normalize our input string for the particular scenario of the input string being \\*, as otherwise we are comparing apples and oranges, and in hindsight I think that's maybe the mistaken assumption in my original post, that isMatch wouldn't change anything for the \\* string, which is not necessarily the case. I'm not sure if that's why Picomatch is giving me the results that I'm getting for that, if it is it seems ok.


Like to summarize I had spotted a scenario that seemed worth looking into in Picomatch, I don't remember exactly all the details about it now, a year later, but it seemed worth reporting it.

You've had a look at it, it seems to work correctly as far as you can tell, and besides the general idea that if there are multiple different implementations inside the same library for matching against a glob they should probably agree with each other, which I guess is just my opinion, I don't have other hopefully useful things to say about this, I think.