Closed es128 closed 8 years ago
It's worth noting that minimatch doesn't fare any better
> minimatch('src/a/b/c.js', 'src/**/*!(_test).js')
true
> minimatch('src/a/b/c_test.js', 'src/**/*!(_test).js')
true
sorry I don't know how I missed this! I'll look and see if this is an easy fix, if not I can try to speed things up with the parser refactor
so, I finally got around to benchmarking the parser refactor (which uses snapdragon) against the current parser. It's still not finished, but the refactor is 20x faster! all that means is that it gives us breathing room to handle edge cases while still being fast. I'll report back when I have more info...
:+1:
It's still not finished, but the refactor is 20x faster!
WOW! That sound amazing!
@es128 okay, I finally got around to improving extglob support, and I looked into this issue specifically, and believe you're getting the correct result.
This is because the *
just before the negation pattern is greedy, so it's matching the pattern before it's negated. However, if you try using a less greed character, it should work.
Example:
mm.isMatch('src/a/b/c_test.js', 'src/**/*!(_test).js')
//=> false
I'm a little confused by your last response - the example in the end in particular since it seems to contradict the rest of what you're saying. Also, I get true
when I try it on the 2.3.0 PR.
But anyway, so if this is correct glob behavior because the wildcard is greedy, then what would be the proper way to achieve the goal of excluding files that match a specific substring in part of their name? Is there a reliable way to do that within a single glob pattern?
bah, my fault. I made you confused with my example. I must have had stars on my mind. the star in the code block should be a dot:
mm.isMatch('src/a/b/c_test.js', 'src/**/.!(_test).js')
sorry about that
Is there a reliable way to do that within a single glob pattern?
I think/hope so lol. I'm trying to get there with extglobs, negation in particular is difficult. Your best bet is to try using characters that are less greedy or more specific than *
. Or try using *?
instead of *?
(I didn't test that yet... so I'm not sure).
Neither .
nor *?
seem to address this use case. Tried playing with variations of bracket expressions and extglobs, but haven't found anything that cuts it (assuming the name before _test
is variable-length).
really? hmm, did you try .
with pr for 2.3.0?
here is what I get with the 2.3.0 code:
console.log(mm.isMatch('src/a/b/c.js', 'src/**/*!(_test).js'));
//=> true
console.log(mm.isMatch('src/a/b/c.js', 'src/**/.!(_test).js'));
//=> false
console.log(mm.isMatch('src/a/b/c_test.js', 'src/**/*!(_test).js'));
//=> true
console.log(mm.isMatch('src/a/b/c_test.js', 'src/**/.!(_test).js'));
//=> false
right so if .
was doing its job shouldn't the second example yield true
?
.
just generally does not appear to the same meaning as it would it a regex - perhaps it needs to be wrapped in parens or something to work the way you were referring to? But then still, it isn't doing the trick for the notion of excluding _test
files.
> mm.isMatch('a/b/c.js', 'a/b/..js')
false
> mm.isMatch('a/b/c.js', 'a/b/(.).js')
true
> mm.isMatch('src/a/b/c.js', 'src/**/(.)!(_test).js')
false
> mm.isMatch('src/a/b/c_test.js', 'src/**/(.)!(_test).js')
false
you're right about the .
. I spent so much time on extglob today I can't remember when or what patterns I tried earlier. in my last example, it appears that only the third example is correct, do I have that right? (I've heard that grade school teachers sometimes forget how to spell words from grading papers with mistakes so often lol. I'm starting to get cross-eyed looking at these today)
Haha - take a break and uncross your eyes :)
From the examples you provided, the first shows the result I'd expect, not the third. The idea was to match all the js files except those with _test
at the end of the name.
In all the times I looked at this, I never noticed that you're trying to (not) match _test
to c_test
in the examples. I think c_test
was supposed to be c/_test
or something like that, right?
Try the following:
console.log(mm(['a/_test.js', 'a/b.js'], 'a/!(_test)*.js'))
//=> ['a/b.js']
Hmm. No, the goal is to match all the files of a certain type except those that match a particular substring in the filename. Not supposed to have a path separator in between.
This is for the sake of a popular angular convention that puts test files alongside application code instead of away in their own dir.
It's kind of like a regex negative lookahead.
Got it. Here is another example:
console.log(mm(['a/b_foo.js', 'a/b_bar.js'], 'a/b!(_foo)*.js'));
//=> [ 'a/b_bar.js' ]
Right?
Yup, it's showing expected behavior. But the original problem case was with the wildcard to the immediate left of the negated section.
yeah the *
should have been after the pattern. like sounds like this can be closed.
or wait, were you saying that the *
should be before the pattern? sorry wasn't trying to dismiss
I'm not sure it's actually ready to be closed. I'll try a few examples tomorrow against the latest micromatch code and report back.
The goal is a pattern that would match a/b.js
and a/b_foo.js
, but not a/b_test.js
nor a/b.html
sounds good
fwiw
var arr = [
'a/b.js',
'a/b.html',
'a/a_foo.js',
'a/b_foo.js',
'a/a_test.js',
'a/b_test.js'
];
var res = mm(arr, 'a/b!(_test)*.js');
console.log(res);
//=> [ 'a/b.js', 'a/b_test.js' ]
Seems right, so yeah maybe it was just a matter of putting the *
at the end just like it would be with a regex negative lookahead. It just may feel a little unintuitive to users because to negate a suffix it has to be put in front.
Does it still behave as expected if the pattern is just a/!(_test)*.js
?
It just may feel a little unintuitive to users
I agree, but this is following regex and bash conventions to the best of my knowledge.
Does it still behave as expected if the pattern is just a/!(_test)*.js?
I was about to say "yes", but what would do you think the expected matches would be lol?
Same 2 as before plus a/a_foo.js
k I'll check
actually IMO the correct result would yield everything but a/b.html
since !
does not signify a character. Let me try something
Adding ?
achieves what you want. This is what I expected:
var arr = [
'a/b.js',
'a/b.html',
'a/a_foo.js',
'a/b_foo.js',
'a/a_test.js',
'a/b_test.js'
];
var res = mm(arr, 'a/?!(_test)*.js');
console.log(res);
//=> [ 'a/b.js', 'a/a_foo.js', 'a/b_foo.js' ]
var arr = [
'a/b.js',
'a/b.html',
'a/a_foo.js',
'a/b_foo.js',
'a/a_test.js',
'a/b_test.js'
];
var res = mm(arr, 'a/!(_test)*.js');
console.log(res);
//=> [ 'a/b.js', 'a/a_foo.js', 'a/b_foo.js', 'a/a_test.js', 'a/b_test.js' ]
I'm good with that. Thanks!
awesome! no problem.
oh, here is an example of something I forgot about, but might be more useful for the use case you mentioned:
var arr = ['a.a', 'a.b', 'a.c.d', 'c.c', 'a.', 'd.d', 'e.e', 'f.f']
console.log(mm(arr, '?!(*.a|*.b|*.c)[a-f]'));
//=> [ 'd.d', 'e.e', 'f.f' ]
Since ?
matches either 1 or 0, using ?
on the above would also match a.
, which is correct. But to match a single character - one and one only - you could also use regex ranges, like [a-z]
etc
Oh, well what we really wanted was to match one or more characters, which is what I thought the *
was accomplishing. Like feature.js
and feature_test.js
. Does the solution with ?
not work there?
here is a longer example, with results from both minimatch and micromatch. (micromatch's results are correct).
var arr = [
'a.a',
'a.aa',
'a.aaa',
'a.b',
'a.bb',
'a.bbb',
'a.c.d',
'c.c',
'c.cc',
'c.ccc',
'a.',
'd.d',
'd.dd',
'd.ddd',
'e.e',
'e.ee',
'e.eee',
'f.f',
'f.ff',
'f.fff',
]
console.log(micromatch(arr, '?!(*.a|*.b|*.c)[a-f]'));
console.log(minimatch.match(arr, '?!(*.a|*.b|*.c)[a-f]'));
micromatch
Since we excluded .a
in the extglob, any pattern with .a
directly following that negation should not be included in the result array:
[ 'd.d',
'd.dd',
'd.ddd',
'e.e',
'e.ee',
'e.eee',
'f.f',
'f.ff',
'f.fff' ]
minimatch
Minimatch incorrectly matches a.a
as well as .b
and .c
which were also negated.
[ 'a.a', // incorrect
'a.b', // incorrect
'a.c.d', // incorrect
'c.c', // incorrect
'd.d',
'd.dd',
'd.ddd',
'e.e',
'e.ee',
'e.eee',
'f.f',
'f.ff',
'f.fff' ]
not sure if this helps
@jonschlinkert all that is awesome, but.. BUT. Why results not includes a.
?
Good moin', yawn... lol.
Because [a-f]
does not have a dot in it ;)
So.. in that logic, then the results should be empty array, cuz every of the current results have the dot lol? Huh... it's, hm.
Ooh, i think i got it, haha. Am i understood it correctly?
if we have
var arr = [
'c.ccc',
'a.',
'd.'
'd.d',
'd.dd',
'd.ddd'
];
then d.
also will be in the final results? Because it dont have *.d
in the ?!(*.a|*.b|*.c)
part?
Sorry, but can't check currently, cuz dont have dev enviroment setup.
close, but a.
and d.
wouldn't be in the results, since there needs to be at least one [a-f]
character after the dot.
if we change the pattern to '?!(*.a|*.b|*.c)[a-f.]'
, it would capture both a.
and d.
Hmm.. I think it's confusing, why after the dot? But okey, maybe problem is on my side. Im out of space from a while, so.. :)
yeah it is kind of confusing. I think it's right but what you're saying makes sense too, when I get a chance I'll paste the expanded regex that micromatch creates from the pattern so we can see what's happening.
I think the moral of the story is that globbing has its limits, and there comes a level of complexity where the right tool for the job is multiple globs (positive plus negated) or just writing a custom regular expression.
Actually it might be a pretty cool feature if there were an option to embed regex syntax into the glob for particular path parts.
and there comes a level of complexity ... if there were an option to embed regex syntax into the glob
Why not just allow regexes instead of digging around string complexity and thinking new ways to do regexes with strings?
I mean.. we should not tolerate using globs for things that can be done with regexes. All that kind of things seems to me like "The Gulp Way" – oo let's go with guuuulp its freakin awesome, answer: Nope, it's not so awesome; it's not so awesome when you just want to run mocha tests - but most of the devs out there use it for this (or something like this), which is awful and slow.
Here the thing is similar (not only in micromatch), I think we trying to do too much, imo. Don't understand me wrong. All that libs whose came from micromatch like extglob, brace/range expansion and etc are great - @jonschlinkert's job is awesome. But it's going to be more and more harder and complex to represent regex mechanisms as/in strings - it's impossible.
I mean.. we.. mm.. you should set some frames/limits for this string jobs.
Just my 2c.
Regexes are allowed. But writing them by hand as well as micromatch does for paths would be hard. Allowing regex syntax within a particular path section could be powerful for some specific circumstances like the one discussed in this thread. Not a necessity or a high priority, just an idea.
I wouldn't mind continuing this discussion, at least until I'm 100% confident that I'm right. It might seem like we're trying to do regex-ish stuff since extglob borrows from regex, but the examples we're discussing here are extglobs and they should work as expected for extglobs.
Why not just allow regexes instead of digging around string complexity and thinking new ways to do regexes with strings?
agreed, micromatch already allows you to pass a regex instead of a glob string (hmm, after writing this I'll go look at the unit tests and make sure this is well-tested and documented)
As for feature support for regex-ish stuff, I think the line is pretty clear on what should be supported and what shouldn't: it makes sense to support regex-ish features that are already familiar and well established elsewhere, like with bash, extglob, POSIX character classes, and wildmatch (git's glob matcher). Extglobs in particular are nice since they are often easier to write than a comparable regex would be.
there comes a level of complexity where the right tool for the job is multiple globs
agreed, and/or a tool like anymatch
Actually it might be a pretty cool feature if there were an option to embed regex syntax into the glob for particular path parts.
That actually would be pretty cool - and probably easy to implement if we decided on a delimiter to use. is there a reason we can't use <
and >
?
@tunnckoCore all of the points you're making are excellent. I think it boils down to "use the right tool for the job". Keep in mind that most of us are humans, and sometimes it's just easier to do something familiar, repeatable, boilerplate-able, or just more fun, than doing what is most efficient.
Not a necessity or a high priority, just an idea.
yeah, but it would be pretty nice to have :)
and I just noticed @es128's "right tool for the job" comment! maybe that's what put that in my head when I was writing lol
I'll suggest triple-slash ///
for an embedded regex delimiter. Less likely to clash with anything else than any other arbitrary set of characters, a bit closer to traditional regex syntax, and has some precedent in coffeescript (albeit used for a different regex-related purpose there)
Less likely to clash with anything else than any other arbitrary set of characters, a bit closer to traditional regex syntax
I'm not so sure. I don't know how common it is to improperly join paths and/or incorrectly convert from windows paths to unix, but I've seen \\\\
and //
lots of times. Also, what if a user wanted to lead or follow the regex with an actual path slash (especially on windows)? You'd have patterns like foo////bar////
. It seems a lot safer to use an illegal character as the delimiter. Maybe #
? To my knowledge that is an illegal character. I think minimatch used to use #
as the delimiter for comments.
Using v2.1.6
Using master branch
Specifying
extglob: true
orextglob: false
does not appear to have any impact. When the*
is removed before the!
it behaves as I'd expect.paulmillr/chokidar#58