gulpjs / glob-stream

Readable streamx interface over anymatch.
MIT License
178 stars 53 forks source link

Breaking: Upgrade to glob 7, implement ignore & remove RegExp as negative matchers (closes #24, closes #57) #71

Closed phated closed 8 years ago

phated commented 8 years ago

I'd like to get some more eyes on this. There's probably some more stuff to get cleaned up, too.

cc @thirdcreed @terinjokes @contra @UltCombo @erikkemperman

phated commented 8 years ago

Oops, meant to cc @thirdcreed in the original body.

jonschlinkert commented 8 years ago

I never noticed that micromatch was only doing the negation patterns here, so this makes sense. fwiw this lgtm

UltCombo commented 8 years ago

👍 Very nice work.

In comparison to #40, it seems the ability to pass node-glob ignore option has been dropped. In #40, the ignore option could be used as an alternative or complement to the negative pattern syntax, but it may as well be redundant as one can simply pass the negative patterns last for the same effect. It's your call if you prefer avoiding redundancy in the API or keeping maximum compatibility with the node-glob API.

erikkemperman commented 8 years ago

So I guess long-term the plan here is to keep the ! syntax for negations, as opposed to the new node-glob API, and internally shuffle things around to fit those negations into its ignore option?

I have no strong preference either way, would (also) find it acceptable for a breaking release here to just adopt node-glob's dropping support of exclamation point negations. Or mark it as deprecated now and switch over completely later on.

Anyway, if the exclamation points are here to stay, perhaps it should be documented that if users pass their own ignore option it will be, um, ignored. Or maybe it could be concatenated / merged with the ignores you conjure out of the exclamation point negations?

yocontra commented 8 years ago

@UltCombo I think its a holdover from when I made the original version a long time ago. The only way to do negations was in the ignore option, then I added it into the main glob array with the ! after a node-glob update, and the ignore option was a backwards compat holdover.

+1 for removing it though, its redundant. Less APIs = better.

UltCombo commented 8 years ago

@contra Actually, I believe ignore was never part of the glob-stream API, it was only present in my old PR (#40) for extended compatibility with the node-glob API.

@erikkemperman Negative patterns, in the sense we are using here, were never actually supported by node-glob—node-glob's "negation" patterns were not used to filter out previous results, but instead its "negation patterns" would match any path that did not match the negated pattern. Glob-stream never made use of this feature, instead, it implemented its own negation patterns logic by manually filtering results with micromatch. This PR improves this logic to make use of a new node-glob option (ignore) which has the same final result and is far more optimized.

edit: grammar

erikkemperman commented 8 years ago

Hm, node-glob still mentions the use of !(xxx) to mean "include paths not matching xxx" but elsewhere it says prefix ! is deprecated in favour of ignore, which I took to mean the sense I know from glob-stream (well, gulp) of a kind of post-match exclusion. Anyway, thanks for clearing that up!

EDIT links: https://github.com/isaacs/node-glob#glob-primer https://github.com/isaacs/node-glob#comments-and-negation

jonschlinkert commented 8 years ago

node-glob still mentions the use of !(xxx)

that brings up a good point. !(xxx) is an extglob pattern. Minimatch/node-glob only has basic extglob support. It might be worth noting on the readme that advanced matching features are being removed

phated commented 8 years ago

@jonschlinkert what do you mean by they only have basic extglob support?

jonschlinkert commented 8 years ago

Minimatch (the matcher for node-glob) fails something like 30% of the extglob tests.

phated commented 8 years ago

@jonschlinkert but you are saying it doesn't support !(foo)?

jonschlinkert commented 8 years ago

I'm not saying it doesn't support any specific pattern. It may or may not.

I'm saying I don't know how to predict what minimatch supports or what it doesn't. In a recent commit he basically copied this code from extglob, but extglobs and "globs" don't use the same matching strategy, which wasn't accounted for. the minimatch issues https://github.com/isaacs/minimatch/issues do a better job of explaining.

(and micromatch only did negations here, so I don't know how important this really is...)

edit: see this issue in particular: https://github.com/isaacs/minimatch/issues/83. (this happens a lot, where makeRe().test() produces a different result than .match, because .match tries to "compensate" for incorrect regex matching by monkeypatching the results)

phated commented 8 years ago

Yeah, it was originally minimatch doing the negations, then switched to micromatch and now it's able to be removed since we normalize globs starting with ! and pass them to node-glob's ignore option. I'm guessing it isn't an issue.

phated commented 8 years ago

@UltCombo good catch. I believe we should concat any ignore patterns from the options to each iteration. I'll make those changes and add tests.

@erikkemperman node-glob's implementation of paths that started with ! was broken and caused them lots of issues so they replaced it with the ignore option. We are going to continue to allow negative globs because our implementation is pretty sophisticated in how negations are done and includes things node-glob can't do because we create a new instance of node-glob for each glob string we receive. I'm going to concat the ignore option (as per mentioned above) to solve that problem you pointed out.

jonschlinkert commented 8 years ago

I agree, it probably shouldn't be an issue. but just to clarify, removing the ! or using ignore etc are unrelated to this, since extglobs can be any part of the negation pattern.

edit: Oh, that brings something else up... If you remove ! from the pattern, you should make sure that it's not followed by (, just in case it's part of an extglob

phated commented 8 years ago

@jonschlinkert one thing that is bad with our implementation would be an extglob that starts with !(foo) because we will treat it as a negation and remove the !. Seems like a really far edge case though.

jonschlinkert commented 8 years ago

Seems like a really far edge case though.

Yeah I agree. I never thought of it until now, and we've never had an issue about it. Still, I'll probably make that change in micromatch to make sure we're matching correctly.

here, you could easily do something like:

return pattern[0] === '!' && pattern[1] !== '(';

and something similar anywhere else you're checking for !. But like you said, it seems like an edge case

jonschlinkert commented 8 years ago

Since I'm currently doing this negation logic in a handful of places, and I've seen it in a number of others like this lib, I created is-negated-glob to make handling of ! and !( consistent.

Of course, don't feel obligated to use that but I'd be happy to add you to the lib in case you do, and/or want to make changes.

erikkemperman commented 8 years ago

Being pedantic, I know, but the bracket is a valid filename character, usually. So would there be a way to escape, or something, or would this implicitly disallow ignores starting with literal (?

Concatenating user-provided ignore option sounds great -- I imagine I might have a kind of global ignore list (**/*.bak and the like) to put in the option whereas exclusions specific to the pattern at hand would be natural to write alongside it.

As node-glob tests the ignore array in natural order (Array.some), ideally the ignores that are most likely to hit a result should be at the front -- is it unreasonable to generally expect "pattern ignores" to be more likely than "option ignores", or vice versa?

jonschlinkert commented 8 years ago

Being pedantic, I know, but the bracket is a valid filename character, usually. So would there be a way to escape, or something,

true, we should add support for escaping (which makes me more glad I created is-negated-glob). IMHO !( is exceedingly unlikely to not be an extglob - given that it's a pattern that is explicitly passed to be used for matching, not the file paths to be matched. But still, it's probably a good idea to respect escaping either ! or !(. There is negligible cost in doing so

erikkemperman commented 8 years ago

Excellent! In my experience end users have an uncanny ability to find cases that the developers thought were unlikely :-)

jonschlinkert commented 8 years ago

end users have an uncanny ability to find cases that the developers thought were unlikely :-)

truer words have never been spoken lol

phated commented 8 years ago

I'm happy to use the module but it needs to be at least version 1.0 and should support escaping.

jonschlinkert commented 8 years ago

actually I wasn't thinking, escaping would already work since we're checking character positions. I think we're in good shape.

jonschlinkert commented 8 years ago

needs to be at least version 1.0

I was planning on bumping after I got your feedback, in case we needed to make changes

edit: based on your prior comment, it sounds like 1.0 is in order then. I'll do that now

edit2: done. is-negated-glob was published and released as 1.0. @phated I also added you as a collab and npm owner.

erikkemperman commented 8 years ago

escaping would already work since we're checking character positions

I would still be unable to have an ignore pattern starting with !( though... Right?

Position check just means that something like !!( (assuming ! as escape char) won't be seen as extglob, but the escape char must then still be discarded.

phated commented 8 years ago

@jonschlinkert not sure why, but dropping that module in is now causing test failures. Trying to debug now.

Figured it out, I was slicing twice.

phated commented 8 years ago

@erikkemperman can you submit some tests for the things you are talking about? I'm pretty confused.

jonschlinkert commented 8 years ago

Position check just means that something like !!( (assuming ! as escape char) won't be seen as extglob, but the escape char must then still be discarded.

we only need to know if the very first character in the unmodified glob pattern is !. If it's escaped, then the first character would be \. If the first character is ! followed by an escaped (, then the second character would be \.

In other words, this lib will strip the first character when it is exactly ! and the second character is exactly NOT (. The globbing tool then handles everything else, so your questions about handling of those patterns would be for node-glob.

edit: so if you did want to have an ignore pattern beginning with !(, where ( is not part of an extglob, then you would need to define !\(. I think this would be expected in all of the globbing libs.

erikkemperman commented 8 years ago

I guess I am suggesting something like

console.log(isNegatedGlob('!!(foo)'));
// escape with ! for literal (
// { pattern: '(foo)', negated: true }

Assuming ! as escape char, which seems reasonable here.

phated commented 8 years ago

Why would ! be an escape character? The escape character is \

erikkemperman commented 8 years ago

I figured since it already has special significance here. Same way you write %% to get a literal % in printf. And the same way you write '\\' in javascript strings to get a single backslash. So then it'd be

console.log(isNegatedGlob('\\!(foo)'));
// escape with \ for literal (
// { pattern: '(foo)', negated: true }

EDIT typo

jonschlinkert commented 8 years ago

Yeah, I'm with @phated on this. Not sure why ! would be used as an escape char.

erikkemperman commented 8 years ago

Sorry for trailing off there, my timezone (Amsterdam) and day job got the better of me... And I don't have time just now to try and construct actually problematic test cases. I'll take a stab at it when I get a chance, and of course I might very well simply have it all backwards.

But I'm thinking whatever escape character ought to be eaten/discarded, before passing on to glob. And while backslash is certainly the go-to escape char a lot of the time, it might have unintended side effects in the case of file/pathname matching (windows), whether it's eaten here or not.

erikkemperman commented 8 years ago

I may have figured out my misunderstanding earlier. So having inadvertently muddied the water on account of being half asleep, allow me to overcompensate...

Here's what I was thinking:

To a first approximation, everything starting with '!' gets the exclamation stripped and goes into negatives, the remainder goes (remains) in positives unchanged.

This raises the question how one might positively match something starting with a literal '!', a special case of which being extglobs like '!(foo)'.

So in terms of escaping, I was only thinking of escaping the exclamation, taking care of the extglob case too, meaning I was expecting something like the following to happen in glob-stream (with, say, backslash as the escape char):

pattern positive negative
'!foo' 'foo'
'!(foo)' '!(foo)'
'\\!foo' '!foo'
'\\!(foo)' '(foo)'

What I'd somehow missed earlier is it looks like you were thinking of escaping the parenthesis in extglobs, instead of the exclamation? I certainly didn't help things along by proposing '!' as escape at the same time, leaving ambiguous if it was escaping the paren or the exclamation.

What I did notice is that is-negated-glob doesn't strip the escape char/s. So, if that's about right, does the following do your current implementation any justice?

pattern positive negative
'!foo' 'foo'
'!(foo)' '!(foo)'
'\\!foo' '\\!foo'
'!\\(foo)' '\\(foo)'

For what it's worth, the first table seems the more natural to me, but then how could it not, having come at it from there.

As a general principle, I think it usually makes sense for a parser that consumes an escape char -- uses it to modify the significance of its successor char -- to then strip it off for subsequent stages. Perhaps you feel this is not something is-negated-glob should do, but I just think that from glob-stream's POV, the cases listed above should probably not result in backslashes in the positive/negative columns, which are then passed on to node-glob?

Assuming I haven't got the account completely wrong thus far, I understand how my suggestion of '!' as escape char would have seemed outrageous, in front of a parenthesis. But in front of another exclamation, it has the advantage of not introducing any additional magic characters, which of course sooner or later someone is going to want to match literally -- It would be unwise, to be sure, but not strictly invalid to have a filename start with \ on mac or linux...

erikkemperman commented 8 years ago

Actually, the following would perhaps be more consistent than either of the tables above, in terms of what the escaping does (override the usual semantics of '!')

pattern positive negative
'!foo' 'foo'
'!(foo)' '(foo)'
'\\!foo' '!foo'
'\\!(foo)' '!(foo)'
phated commented 8 years ago

Just pushed support and tests for the ignore option. @erikkemperman @UltCombo let me know if I'm missing anything in the implementation or tests.

Going to do some more things in this PR before I ask for another full review pass.

phated commented 8 years ago

Alright, I think I've landed all the changes I wanted to in relation to this (removing the external API felt like it made sense to achieve some refactor points). If everyone could review again, I can get this wrapped up and push through on the remaining few issues before the major release.

phated commented 8 years ago

@erikkemperman your table like:

pattern positive negative
'!foo' 'foo'
'!(foo)' '!(foo)'
'\\!foo' '\\!foo'
'!\\(foo)' '\\(foo)'

is the correct implementation for how JS strings/escaping behave. And I believe the tests prove it.

UltCombo commented 8 years ago

The diff isn't easy to read, but I like the new style. 👍

One thing that calls my attention is the undocumented root option, but I guess that goes beyond the scope of this PR.

phated commented 8 years ago

Sorry about that. I opened an issue for the root option that will be addressed separately.

UltCombo commented 8 years ago

No problem, this is awesome work! 😄

erikkemperman commented 8 years ago

That's... Interesting! Due to how node-glob / minimatch constructs RegExps from its arguments, stripping or not stripping the escaping slashes from patterns like '\\!foo' and '!\\(foo)' seems to make no functional difference at all.

I stubbornly maintain that stripping them is conceptually cleaner, because the escaping as employed here is specific to glob-stream and meaningless to node-glob, but I am unable to construe cases where it'll actually lead to different results.

Anyway, I'll shut up now -- apologies for distraction.

phated commented 8 years ago

@erikkemperman it's not meaningless to node-glob, that is the same way they escape patterns. Stripping it would break the meaning of the escape when passed to node-glob. I think we've skewed very far from the point of this PR though.

Waiting on more feedback before I merge this in. Thanks all.

erikkemperman commented 8 years ago

Yeah, that's on me, sorry. I'm going to try to figure out what I'm apparently still missing, but probably not out loud..

phated commented 8 years ago

Thanks all! This has been merged.