Closed phated closed 8 years ago
Oops, meant to cc @thirdcreed in the original body.
I never noticed that micromatch was only doing the negation patterns here, so this makes sense. fwiw this lgtm
👍 Very nice work.
In comparison to #40, it seems the ability to pass node-glob ignore
option has been dropped. In #40, the ignore
option could be used as an alternative or complement to the negative pattern syntax, but it may as well be redundant as one can simply pass the negative patterns last for the same effect. It's your call if you prefer avoiding redundancy in the API or keeping maximum compatibility with the node-glob API.
So I guess long-term the plan here is to keep the !
syntax for negations, as opposed to the new node-glob API, and internally shuffle things around to fit those negations into its ignore
option?
I have no strong preference either way, would (also) find it acceptable for a breaking release here to just adopt node-glob's dropping support of exclamation point negations. Or mark it as deprecated now and switch over completely later on.
Anyway, if the exclamation points are here to stay, perhaps it should be documented that if users pass their own ignore
option it will be, um, ignored. Or maybe it could be concatenated / merged with the ignores you conjure out of the exclamation point negations?
@UltCombo I think its a holdover from when I made the original version a long time ago. The only way to do negations was in the ignore
option, then I added it into the main glob array with the !
after a node-glob update, and the ignore
option was a backwards compat holdover.
+1 for removing it though, its redundant. Less APIs = better.
@contra Actually, I believe ignore
was never part of the glob-stream
API, it was only present in my old PR (#40) for extended compatibility with the node-glob API.
@erikkemperman Negative patterns, in the sense we are using here, were never actually supported by node-glob—node-glob's "negation" patterns were not used to filter out previous results, but instead its "negation patterns" would match any path that did not match the negated pattern. Glob-stream never made use of this feature, instead, it implemented its own negation patterns logic by manually filtering results with micromatch. This PR improves this logic to make use of a new node-glob option (ignore
) which has the same final result and is far more optimized.
edit: grammar
Hm, node-glob still mentions the use of !(xxx)
to mean "include paths not matching xxx
" but elsewhere it says prefix !
is deprecated in favour of ignore
, which I took to mean the sense I know from glob-stream (well, gulp) of a kind of post-match exclusion. Anyway, thanks for clearing that up!
EDIT links: https://github.com/isaacs/node-glob#glob-primer https://github.com/isaacs/node-glob#comments-and-negation
node-glob still mentions the use of !(xxx)
that brings up a good point. !(xxx)
is an extglob pattern. Minimatch/node-glob only has basic extglob support. It might be worth noting on the readme that advanced matching features are being removed
@jonschlinkert what do you mean by they only have basic extglob support?
Minimatch (the matcher for node-glob) fails something like 30% of the extglob tests.
@jonschlinkert but you are saying it doesn't support !(foo)
?
I'm not saying it doesn't support any specific pattern. It may or may not.
I'm saying I don't know how to predict what minimatch supports or what it doesn't. In a recent commit he basically copied this code from extglob, but extglobs and "globs" don't use the same matching strategy, which wasn't accounted for. the minimatch issues https://github.com/isaacs/minimatch/issues do a better job of explaining.
(and micromatch only did negations here, so I don't know how important this really is...)
edit: see this issue in particular: https://github.com/isaacs/minimatch/issues/83. (this happens a lot, where makeRe().test()
produces a different result than .match
, because .match
tries to "compensate" for incorrect regex matching by monkeypatching the results)
Yeah, it was originally minimatch doing the negations, then switched to micromatch and now it's able to be removed since we normalize globs starting with !
and pass them to node-glob's ignore option. I'm guessing it isn't an issue.
@UltCombo good catch. I believe we should concat any ignore
patterns from the options to each iteration. I'll make those changes and add tests.
@erikkemperman node-glob's implementation of paths that started with !
was broken and caused them lots of issues so they replaced it with the ignore
option. We are going to continue to allow negative globs because our implementation is pretty sophisticated in how negations are done and includes things node-glob can't do because we create a new instance of node-glob for each glob string we receive. I'm going to concat the ignore option (as per mentioned above) to solve that problem you pointed out.
I agree, it probably shouldn't be an issue. but just to clarify, removing the !
or using ignore
etc are unrelated to this, since extglobs can be any part of the negation pattern.
edit: Oh, that brings something else up... If you remove !
from the pattern, you should make sure that it's not followed by (
, just in case it's part of an extglob
@jonschlinkert one thing that is bad with our implementation would be an extglob that starts with !(foo)
because we will treat it as a negation and remove the !
. Seems like a really far edge case though.
Seems like a really far edge case though.
Yeah I agree. I never thought of it until now, and we've never had an issue about it. Still, I'll probably make that change in micromatch to make sure we're matching correctly.
here, you could easily do something like:
return pattern[0] === '!' && pattern[1] !== '(';
and something similar anywhere else you're checking for !
. But like you said, it seems like an edge case
Since I'm currently doing this negation logic in a handful of places, and I've seen it in a number of others like this lib, I created is-negated-glob to make handling of !
and !(
consistent.
Of course, don't feel obligated to use that but I'd be happy to add you to the lib in case you do, and/or want to make changes.
Being pedantic, I know, but the bracket is a valid filename character, usually. So would there be a way to escape, or something, or would this implicitly disallow ignores starting with literal (
?
Concatenating user-provided ignore
option sounds great -- I imagine I might have a kind of global ignore list (**/*.bak
and the like) to put in the option whereas exclusions specific to the pattern at hand would be natural to write alongside it.
As node-glob tests the ignore
array in natural order (Array.some
), ideally the ignores that are most likely to hit a result should be at the front -- is it unreasonable to generally expect "pattern ignores" to be more likely than "option ignores", or vice versa?
Being pedantic, I know, but the bracket is a valid filename character, usually. So would there be a way to escape, or something,
true, we should add support for escaping (which makes me more glad I created is-negated-glob). IMHO !(
is exceedingly unlikely to not be an extglob - given that it's a pattern that is explicitly passed to be used for matching, not the file paths to be matched. But still, it's probably a good idea to respect escaping either !
or !(
. There is negligible cost in doing so
Excellent! In my experience end users have an uncanny ability to find cases that the developers thought were unlikely :-)
end users have an uncanny ability to find cases that the developers thought were unlikely :-)
truer words have never been spoken lol
I'm happy to use the module but it needs to be at least version 1.0 and should support escaping.
actually I wasn't thinking, escaping would already work since we're checking character positions. I think we're in good shape.
needs to be at least version 1.0
I was planning on bumping after I got your feedback, in case we needed to make changes
edit: based on your prior comment, it sounds like 1.0 is in order then. I'll do that now
edit2: done. is-negated-glob was published and released as 1.0. @phated I also added you as a collab and npm owner.
escaping would already work since we're checking character positions
I would still be unable to have an ignore pattern starting with !(
though... Right?
Position check just means that something like !!(
(assuming !
as escape char) won't be seen as extglob, but the escape char must then still be discarded.
@jonschlinkert not sure why, but dropping that module in is now causing test failures. Trying to debug now.
Figured it out, I was slicing twice.
@erikkemperman can you submit some tests for the things you are talking about? I'm pretty confused.
Position check just means that something like !!( (assuming ! as escape char) won't be seen as extglob, but the escape char must then still be discarded.
we only need to know if the very first character in the unmodified glob pattern is !
. If it's escaped, then the first character would be \
. If the first character is !
followed by an escaped (
, then the second character would be \
.
In other words, this lib will strip the first character when it is exactly !
and the second character is exactly NOT (
. The globbing tool then handles everything else, so your questions about handling of those patterns would be for node-glob.
edit: so if you did want to have an ignore pattern beginning with !(
, where (
is not part of an extglob, then you would need to define !\(
. I think this would be expected in all of the globbing libs.
I guess I am suggesting something like
console.log(isNegatedGlob('!!(foo)'));
// escape with ! for literal (
// { pattern: '(foo)', negated: true }
Assuming !
as escape char, which seems reasonable here.
Why would !
be an escape character? The escape character is \
I figured since it already has special significance here. Same way you write %%
to get a literal %
in printf
. And the same way you write '\\'
in javascript strings to get a single backslash. So then it'd be
console.log(isNegatedGlob('\\!(foo)'));
// escape with \ for literal (
// { pattern: '(foo)', negated: true }
EDIT typo
Yeah, I'm with @phated on this. Not sure why !
would be used as an escape char.
Sorry for trailing off there, my timezone (Amsterdam) and day job got the better of me... And I don't have time just now to try and construct actually problematic test cases. I'll take a stab at it when I get a chance, and of course I might very well simply have it all backwards.
But I'm thinking whatever escape character ought to be eaten/discarded, before passing on to glob. And while backslash is certainly the go-to escape char a lot of the time, it might have unintended side effects in the case of file/pathname matching (windows), whether it's eaten here or not.
I may have figured out my misunderstanding earlier. So having inadvertently muddied the water on account of being half asleep, allow me to overcompensate...
Here's what I was thinking:
To a first approximation, everything starting with '!'
gets the exclamation stripped and goes into negatives, the remainder goes (remains) in positives unchanged.
This raises the question how one might positively match something starting with a literal '!'
, a special case of which being extglobs like '!(foo)'
.
So in terms of escaping, I was only thinking of escaping the exclamation, taking care of the extglob case too, meaning I was expecting something like the following to happen in glob-stream (with, say, backslash as the escape char):
pattern | positive | negative |
---|---|---|
'!foo' |
'foo' |
|
'!(foo)' |
'!(foo)' |
|
'\\!foo' |
'!foo' |
|
'\\!(foo)' |
'(foo)' |
What I'd somehow missed earlier is it looks like you were thinking of escaping the parenthesis in extglobs, instead of the exclamation? I certainly didn't help things along by proposing '!'
as escape at the same time, leaving ambiguous if it was escaping the paren or the exclamation.
What I did notice is that is-negated-glob doesn't strip the escape char/s. So, if that's about right, does the following do your current implementation any justice?
pattern | positive | negative |
---|---|---|
'!foo' |
'foo' |
|
'!(foo)' |
'!(foo)' |
|
'\\!foo' |
'\\!foo' |
|
'!\\(foo)' |
'\\(foo)' |
For what it's worth, the first table seems the more natural to me, but then how could it not, having come at it from there.
As a general principle, I think it usually makes sense for a parser that consumes an escape char -- uses it to modify the significance of its successor char -- to then strip it off for subsequent stages. Perhaps you feel this is not something is-negated-glob should do, but I just think that from glob-stream's POV, the cases listed above should probably not result in backslashes in the positive/negative columns, which are then passed on to node-glob?
Assuming I haven't got the account completely wrong thus far, I understand how my suggestion of '!'
as escape char would have seemed outrageous, in front of a parenthesis. But in front of another exclamation, it has the advantage of not introducing any additional magic characters, which of course sooner or later someone is going to want to match literally -- It would be unwise, to be sure, but not strictly invalid to have a filename start with \
on mac or linux...
Actually, the following would perhaps be more consistent than either of the tables above, in terms of what the escaping does (override the usual semantics of '!'
)
pattern | positive | negative |
---|---|---|
'!foo' |
'foo' |
|
'!(foo)' |
'(foo)' |
|
'\\!foo' |
'!foo' |
|
'\\!(foo)' |
'!(foo)' |
Just pushed support and tests for the ignore option. @erikkemperman @UltCombo let me know if I'm missing anything in the implementation or tests.
Going to do some more things in this PR before I ask for another full review pass.
Alright, I think I've landed all the changes I wanted to in relation to this (removing the external API felt like it made sense to achieve some refactor points). If everyone could review again, I can get this wrapped up and push through on the remaining few issues before the major release.
@erikkemperman your table like:
pattern | positive | negative |
---|---|---|
'!foo' |
'foo' |
|
'!(foo)' |
'!(foo)' |
|
'\\!foo' |
'\\!foo' |
|
'!\\(foo)' |
'\\(foo)' |
is the correct implementation for how JS strings/escaping behave. And I believe the tests prove it.
The diff isn't easy to read, but I like the new style. 👍
One thing that calls my attention is the undocumented root
option, but I guess that goes beyond the scope of this PR.
Sorry about that. I opened an issue for the root option that will be addressed separately.
No problem, this is awesome work! 😄
That's... Interesting! Due to how node-glob / minimatch constructs RegExps from its arguments, stripping or not stripping the escaping slashes from patterns like '\\!foo'
and '!\\(foo)'
seems to make no functional difference at all.
I stubbornly maintain that stripping them is conceptually cleaner, because the escaping as employed here is specific to glob-stream and meaningless to node-glob, but I am unable to construe cases where it'll actually lead to different results.
Anyway, I'll shut up now -- apologies for distraction.
@erikkemperman it's not meaningless to node-glob, that is the same way they escape patterns. Stripping it would break the meaning of the escape when passed to node-glob. I think we've skewed very far from the point of this PR though.
Waiting on more feedback before I merge this in. Thanks all.
Yeah, that's on me, sorry. I'm going to try to figure out what I'm apparently still missing, but probably not out loud..
Thanks all! This has been merged.
I'd like to get some more eyes on this. There's probably some more stuff to get cleaned up, too.
cc @thirdcreed @terinjokes @contra @UltCombo @erikkemperman