gulpjs / glob-stream

Readable streamx interface over anymatch.
MIT License
178 stars 53 forks source link

Respect globs array order, fixes #25 #27

Closed UltCombo closed 9 years ago

UltCombo commented 9 years ago

Fixes #25 Fixes gulpjs/gulp#837

coveralls commented 9 years ago

Coverage Status

Coverage increased (+0.39%) when pulling ea2f8e25d6aa896c9b39e97998c3fea417f751c3 on UltCombo:globs-order into 472b98e7a0a747a3c72454337def65cebc4fb78e on wearefractal:master.

coveralls commented 9 years ago

Coverage Status

Coverage increased (+0.39%) when pulling a8a7011c2591ceb5161b86a40cf5b4b962cd7a30 on UltCombo:globs-order into 472b98e7a0a747a3c72454337def65cebc4fb78e on wearefractal:master.

coveralls commented 9 years ago

Coverage Status

Coverage increased (+0.48%) when pulling bf90ab1e131d6f89d10c4ca21a1a889df3b4aef5 on UltCombo:globs-order into 472b98e7a0a747a3c72454337def65cebc4fb78e on wearefractal:master.

UltCombo commented 9 years ago

I've removed all one-liners and did some clean up.

coveralls commented 9 years ago

Coverage Status

Coverage increased (+0.43%) when pulling afda4f591ea59da7f6bf970754b2614fa06efbcd on UltCombo:globs-order into 472b98e7a0a747a3c72454337def65cebc4fb78e on wearefractal:master.

UltCombo commented 9 years ago

I'm just not sure whether the variable names are meaningful enough, please give suggestions if they're not good enough.

yocontra commented 9 years ago

Looks good to me. I will publish a new major release of glob-stream with this since it is a breaking change, and then a major release for vinyl-fs which depends on this new version, then gulp 4 will depend on the new vinyl-fs which will have a few other breaking changes. Thanks for the fix! Would you mind commenting on any relevant issues and link to this so I can go close them?

UltCombo commented 9 years ago

Thanks for the roadmap clarification. :)

Would you mind commenting on any relevant issues and link to this so I can go close them?

I can try, although I'm not very familiar with those repositories' issues.

yocontra commented 9 years ago

@UltCombo Search for "symlink" and that should make it easy.

If you have time can you send a PR to the vinyl-fs docs to add a note about glob ordering?

UltCombo commented 9 years ago

I've seen 3 or so issues about symlinks so far but they didn't appear to be related to this issue at all.

By the way, shouldn't we update this repository's docs as well?

yocontra commented 9 years ago

@UltCombo Oops wrong comment wrong issue, my bad. I'm adding to the vinyl-fs docs, consider this all finished. Thanks!

UltCombo commented 9 years ago

Hahah no problem, thanks for all the hard work maintaining the gulp project. :smile:

yocontra commented 9 years ago

Have you seen https://github.com/es128/anymatch ? I think this would clean up the code a lot. Not sure if it respects ordering

UltCombo commented 9 years ago

Nope, I haven't seen it before. I can try it tomorrow when I have a bit more free time.

Though, right now I'm not sure how anymatch would be really useful here -- as far as I can see, it does not read files from disk, so the ordering logic I've implemented would still have to be kept. If anything, anymatch would simplify the isMatch and filterNegatives functions.

UltCombo commented 9 years ago

Oh wait, if it does respect the globs order then we might be able to replace the logic I've implemented (little sleepy, not sure honestly). I'll check tomorrow. :D

UltCombo commented 9 years ago

@contra Doesn't look like anymatch supports negative/ordered globs. We could use multimatch instead, which is basically minimatch with added support for multiple/ordered globs. Or, we could go further and use gulp-filter which wraps multimatch in a stream.

But I'm not sure -- even if we do this refactoring, it will all go to waste when fixing #24 as it will require the negative globs logic to be implemented in the globber.

yocontra commented 9 years ago

@UltCombo I don't see node-glob adding that any time soon, it's a pretty huge undertaking considering the size of that codebase

UltCombo commented 9 years ago

@contra I was thinking about using a different globber. But that would probably bring too many breaking changes, and node-glob seems to be the ubiquitous solution nowadays.

Anyway, what do you think about using globby? It extends node-glob adding multiple ordered globs support. Although, its negative filtering implementation is very different from glob-stream -- while glob-stream handles exclusion through minimatch, globby globs the files that would be matched by the negative globs and then excludes them from the matched files array. We'd need some perf tests.

yocontra commented 9 years ago

@UltCombo I'm totally open to changing it. I don't think there would be any breaking changes actually, our use of node-glob is really basic and abstracted by glob-stream. If you send a PR to switch globbers and all of the tests still pass I'd merge for perf increases

UltCombo commented 9 years ago

Thing is, I believe many gulp users rely on node-glob features (or rather, minimatch features) on their gulp.src() globs, that's why I'd consider changing the globber a breaking change.

globby just wraps node-glob, so it preserves the same set of features from node-glob. I'm just not sure about perf, as globby matches the negative globs against the file system instead of negating through minimatch as glob-stream does. That's why we need to do perf tests first. Anyway, if there's a perf problem with globby, I believe @sindresorhus can help us here. ;)

yocontra commented 9 years ago

@UltCombo If we are going to switch globbers it should be now since we are about to push 4 which has a bunch of breaking changes anyways

UltCombo commented 9 years ago

Yes, I'm just saying if we switch to globby then there is no breaking change other than the ordered globs feature which we already landed in.

yocontra commented 9 years ago

@UltCombo Want to play around with both routes and see which performs better?

UltCombo commented 9 years ago

Sure, I'll prepare some test cases based on #24's use case. ;)

UltCombo commented 9 years ago

Here's a quick and dirty gist. Although glob-stream has the overhead of creating streams for the matched files (which is not really significant as there is only 1 matched file in my test), glob-stream is still over 2 times faster than globby.

Not sure what to do here. I believe the correct would be to propose an improvement to globby's negative globs handling (that is, to use the same logic as glob-stream currently uses), I'm sure @sindresorhus is open to improvements in his packages. After that, we can change the globber here and remove the multiple glob handling logic (which is then handled at globby).

I'm just not sure as this is a decent amount of work for technically no perf gain and no significant changes, just cleaner code here.

UltCombo commented 9 years ago

Taking a second look at globby, it uses async.reduce which runs in series, that is another perf loss regarding multiple positive globs.

I'll raise all these issues in the globby repository, but for now it seems better to stay with glob-stream's current multiple globs handling logic.

UltCombo commented 9 years ago

Alright, I've reported these issues (Negative globs handling is very slow and Multiple positive globs handling is slow). Fixing these would mean pretty much a complete rewrite of globby, letting alone switching to globby would have one more perf issue: globby only returns the matched paths after it finishes all the globing work, while glob-stream can emit files as soon as it matches them. This perf issue can't be patched in globby unless an events or streaming API is added to globby.

sindresorhus commented 9 years ago

I would be happy to improve globby to make it workable for gulp, but I'm short on time atm as I'm traveling, so probably won't happen in the short run unless I get some help.

https://github.com/sindresorhus/globby/issues/5

yocontra commented 9 years ago

What is globby doing that has any advantage perf-wise over what glob-stream is doing currently?

UltCombo commented 9 years ago

@contra erm, none I guess. As far as I can see, both packages have the same functionality (except that glob-stream has streaming added) and my initial suggestion to use globby was just to not reinvent the wheel, but I was unaware of the globby perf issues until then.

It seems like it would be possible to move 90% of glob-stream to globby thus solving its perf issues, but at that point it might as well be possible to add streaming support to globby and there would be no reason to have this package then. That is, we would just be moving things around without any real benefit.

yocontra commented 9 years ago

@UltCombo Yeah it seems like we're just moving everything to globby and then turning it into an identical module. I don't see the purpose in that, I think we need to get back to the core of the problem which was a globber that supports multiple globs and checks them while crawling, not another system that grabs all the results and then checks them.

yocontra commented 9 years ago

I'll put a $500 dollar bounty on somebody adding support for multiple globs to node-glob

UltCombo commented 9 years ago

Agreed, the idea to use globby was just to clean up code, that is, to not duplicate functionality that already exists in another package. But given the perf issues, and that even after fixing the globby perf issues we'd still have the same perf as now, it doesn't really help much.

Thing is, I've looking around the npm registry and other resources and can't seem to find a suitable glob library -- looks like node-glob is the ubiqutuous globber.

I believe @sindresorhus has a better hold of the Node.js environment than me, but it seems a little controversial to ask him for a package that does exactly the same thing as his globby package does but better.

yocontra commented 9 years ago

I just commented here https://github.com/isaacs/node-glob/issues/84

Maybe some $$$ will get somebody motivated to fix this. I'm locked up on time right now and I'm sure @isaacs is busy running his company too

UltCombo commented 9 years ago

I'll put a $500 dollar bounty on somebody adding support for multiple globs to node-glob

Woah, looks like our issues are about to be solved instantly. :laughing:

sindresorhus commented 9 years ago

I too would prefer to have node-glob support this.

@contra Create a bountysource and tweet it, and I'll make sure to put in some $$ too and get it RTed ;)

yocontra commented 9 years ago

@sindresorhus tweeted

yocontra commented 9 years ago

This and the file watching are like the last bottlenecks/bug generators in gulp

doowb commented 9 years ago

@jonschlinkert and I have run into the a lot of the same issues with node-glob on our own projects and started working on a replacement. @jonschlinkert already created an awesome project called micromatch that solves some of these issues and is 10x-20x faster than minimatch

The 3 areas that we find important for this new project are:

Since we share a lot of the same goals, we'd like to get your input on sensible defaults (like excluding node_modules unless specified in a pattern) and feedback when the new project is published.

Also, @contra mentioned in #24 looking for a potential replacement to node-glob and I'm confident this new project will suit your needs.

jonschlinkert commented 9 years ago

hey thanks @doowb. yeah, to second what Brian's saying we went down this same path and came to the realization that the problems are systemic. e.g. we can't speed up node-glob by adding another method or by changing the negation strategy in a dependent lib. it just needs to be re-written from the ground up with a different filtering strategy. since minimatch and node-glob were initially created we've learned a lot about what a globbing lib should do for node.js, so the goal is to focus on features that make sense for node.js (versus Bash 4.3 parity). e.g. you'll still use patterns the same way, 99% of end users won't know the difference, but we'll make it easier to define negation patterns, we're considering excluding node_modules by default and only globbing those files if node_modules is explicitly defined (rather than the other way around). etc. these are just example, the goal is to create a glob lib that does what we need.

anyway, we'll be releasing the new globbing lib in the next week or so, it would be great to get your feedback/wishlist regarding the features that would be most important to gulp. obviously no expectations that you guys will use it or whatever

yocontra commented 9 years ago

@jonschlinkert Why not develop it out in the open? The cathedral approach is too bad :(

jonschlinkert commented 9 years ago

yeah, sure thing, sorry my language was misleading:

we'll be releasing the new globbing lib in the next week

meant we'll be "creating" a new lib. e.g:

it would be great to get your feedback/wishlist regarding the features that would be most important to gulp

we've been discussing it, but no actual code yet. I'd love to collaborate on features, everyone wins

UltCombo commented 9 years ago

@jonschlinkert perhaps it would be nice to have a repository where we can open issues for such discussions?

jonschlinkert commented 9 years ago

done https://github.com/jonschlinkert/glob-fs/issues. a wishlist would be a great starting point. I'm about to share some cookies with a 1yr old and 4yr old :) so I'll try to start a list later this week. but feel free to get something started

jonschlinkert commented 9 years ago

Okay, I started a wishlist here: https://github.com/jonschlinkert/glob-fs/issues/1 to get some discussion started