Closed srdo closed 2 years ago
I just got burned by this too. I need it to work, so I'll see what I can figure out.
I think the bug must be in the parser somewhere. My minimal repro is:
assert(isMatch('a/b', '(x|**)'));
That expression is parsed as follows:
[
{ "type": "bos", "value": "", "output": "" },
{ "type": "paren", "value": "(" },
{ "type": "text", "value": "x|", "output": "|" },
{ "type": "star", "value": "*", "output": "[^/]*?" },
{ "type": "star", "value": "*", "output": "" },
{ "type": "paren", "value": ")", "output": ")" }
]
Compare that with pm.parse('(**)')
:
[
{ "type": "bos", "value": "", "output": "" },
{ "type": "paren", "value": "(" },
{
"type": "globstar",
"value": "**",
"output": "(?:(?:(?!(?:^|\\\\/)\\\\.).)*?)"
},
{ "type": "paren", "value": ")", "output": ")" }
]
And pm.parse('(x|y)')
:
[
{ "type": "bos", "value": "", "output": "" },
{ "type": "paren", "value": "(" },
{ "type": "text", "value": "x|y", "output": "|y" },
{ "type": "paren", "value": ")", "output": ")" }
]
Something else is clearly very odd about this, since the output would appear to be invalid, i.e. no part of the reported output would ever match x
. In practice though x
can be matched by (x|y)
.
I wonder why |
is always treated as type: 'text'
by the parser: https://github.com/micromatch/picomatch/blob/5467a5a9638472610de4f30709991b9a56bb5613/lib/parse.js#L616-L622
I'm also a bit suspicious about this: https://github.com/micromatch/picomatch/blob/5467a5a9638472610de4f30709991b9a56bb5613/lib/parse.js#L825-L828
It looks to me like what should happen is we should fall through a couple cases and end up here: https://github.com/micromatch/picomatch/blob/5467a5a9638472610de4f30709991b9a56bb5613/lib/parse.js#L909
Instead we're falling into this case: https://github.com/micromatch/picomatch/blob/5467a5a9638472610de4f30709991b9a56bb5613/lib/parse.js#L837-L840
I suspect the confusing naming scheme with prev
, prior
and before
has led to confusion about what contains what.
Adding an or to this pattern breaks it, and I'm wondering if it's because the or is being "counted as" another segment?
This isn't a bug, the pattern you're using isn't a supported glob. It's just creating a regular expression that happens to not match what you want. Try wrapping the segment in parentheses like this:
pm.isMatch('dir1/sub/sub', 'dir1/(**|dir2)/**');
pm.isMatch('dir2/sub/sub', 'dir1/(**|dir2)/**');
pm.isMatch('dir1/sub', 'dir1/(**|dir2)/**');
Or you can use braces, like: dir1/{**,dir2}/**
.
@jonschlinkert Could you take another look at this? The example pattern you've provided always matches everything, and you seem to be asking me to do exactly what I am already doing that does not work.
Again the failing test case is:
pm.isMatch('a/b', '(x|**)'); // false
and it fails because the **
is interpreted as if it were a *
for no reason that I understand.
Your parser also has different output for (**|x)
and (x|**)
:
> pm.parse('(**|x)').tokens.map(({prev, ...rest}) => rest)
[
{ type: 'bos', value: '', output: '' },
{ type: 'paren', value: '(' },
{ type: 'star', value: '*', output: '[^/]*?' },
{ type: 'text', value: '|x', output: 'x' },
{ type: 'paren', value: ')', output: ')' }
]
> pm.parse('(x|**)').tokens.map(({prev, ...rest}) => rest)
[
{ type: 'bos', value: '', output: '' },
{ type: 'paren', value: '(' },
{ type: 'text', value: 'x|', output: '|' },
{ type: 'star', value: '*', output: '[^/]*?' },
{ type: 'star', value: '*', output: '' },
{ type: 'paren', value: ')', output: ')' }
]
Ah I'm sorry, I guess I've sort of co-opted this issue. You're saying that OPs code would work if they just added parens, and I'm saying that parens or no it's still gonna be at least partly busted (and busted in different ways depending on the order of the **
within the alternative group).
Sorry it's a bit off topic, but it's late and I'm having fun. I think I'm going to try my hand at rewriting this parser using a technique I've been building slowly towards. My technique would create a fully streaming (!) n-lookahead parser with top-down control and easy and expressive error handling. An implementation of parserate
(to be built on top of the existing forkerate) would allow a parser to be written like so:
function *tokens(iterable, options) {
const parsr = parserate(iterable);
while (!parsr.done) {
if (parsr.consume(/\w+/), 'text') {
yield {
type: 'text',
value: parsr.consumed,
}
} else if (parsr.consume(/\d+/), 'digit') {
yield {
type: 'digit',
value: parseInt(parsr.consumed, 10),
}
} else if (options.backreferences && parsr.consume(/\{(\d+)\}/, 'backreference')) {
yield {
type: 'backreference',
value: parseInt(parsr.consumedMatch[1], 10),
}
} else {
throw parsr.error(`Unexpected character: '${parsr.value}'`);
// SyntaxError: Unexpected character: '&'
// Expected one of: text, digit
// line: 1,
// col: 17
}
}
}
@jonschlinkert You are right, I was missing some parentheses. Changing the pattern to (dir1/**)|(dir2/**)
did the trick to express "paths starting with dir1/ or dir2/". Thanks for the hint.
@conartist6 If your original pattern is intended to express "x or nonempty string", it looks like (x)|(**)
does that.
I'll close this issue, I'm hoping this hint also solves your problem Conrad, but either way it's probably better if you raise a separate issue for it :)
I just want to make clear for anyone else that happens across this that the problem described by the title of this issue is not currently resolved, and is now tracked in #104.
Hi.
According to the docs, globstar will only match path separators if
**
are the only characters in a path segment. The following match returns true as expected, since**
are the only characters in the last path segment.Adding an or to this pattern breaks it, and I'm wondering if it's because the or is being "counted as" another segment?