gulpjs / gulp

A toolkit to automate & enhance your workflow
https://gulpjs.com
MIT License
33.01k stars 4.23k forks source link

Support for multiple cores/processors #317

Closed alexwhitman closed 9 years ago

alexwhitman commented 10 years ago

As tasks are run concurrently it would be useful if gulp could use mutliple cores/processors to speed up large builds. This could potentially be done using the cluster api.

yocontra commented 10 years ago

Streams don't work this way but I'm leaving this open because I think it would be cool to make a version of through2 that spun up child_processes or used something like threads-a-go-go

sindresorhus commented 10 years ago

That would be rad. There's also: https://github.com/audreyt/node-webworker-threads

The problem with spinning up child processes is that it comes with some overhead, usually 100-500ms startup cost.

yocontra commented 10 years ago

@sindresorhus :+1:

terinjokes commented 10 years ago

In my previous work here, I've found that sending and receiving data from child processes was noticeably slower. I used a node C++ extension that gave the ability to spin up libuv threads that worked fine, but you lose access to a lot of "node" in that case.

yocontra commented 10 years ago

We should benchmark webworker-threads vs. libuv threads vs. threads-a-go-go vs. child_processes and see which would be best for our case.

sindresorhus commented 10 years ago

:+1:

This is where it would be useful if Node had full support for Isolates. Maybe in the future.

yocontra commented 10 years ago

@sindresorhus https://stackoverflow.com/questions/9131902/what-were-node-js-isolates-and-why-are-they-now-dead

sindresorhus commented 10 years ago

@Contra i know. see:

https://github.com/joyent/node/issues/6899

and

http://strongloop.com/developers/videos/#whats-new-in-nodejs-v012 (03:30)

mako-taco commented 10 years ago

@contra I am struggling to see a way to do this with TAGG or node-webworkers, because afaik you cannot require modules inside of the threads. What are your thoughts on how to accomplish this?

heikki commented 9 years ago

I did some experiments with child processes. https://github.com/heikki/spawn-task-experiment

Running self contained tasks in child processes is surprisingly fast. Test task takes all the js files from node_modules, concats them and creates sourcemaps.

Two normal tasks run parallel:

∴ spawn-task-experiment git:(master) ./node_modules/.bin/gulp normal
[22:09:16] Using gulpfile ~/Desktop/spawn-task-experiment/gulpfile.js
[22:09:16] Starting 'normal'...
[22:09:16] Starting 'clean'...
[22:09:16] Finished 'clean' after 9.08 ms
[22:09:16] Starting 'parallel'...
[22:09:16] Starting 'normal-task'...
[22:09:16] Starting 'normal-task'...
[22:09:45] Finished 'normal-task' after 29 s
[22:09:45] Finished 'normal-task' after 29 s
[22:09:45] Finished 'parallel' after 29 s
[22:09:45] Finished 'normal' after 29 s

Two spawned tasks run parallel:

∴ spawn-task-experiment git:(master) ./node_modules/.bin/gulp spawn
[22:09:51] Using gulpfile ~/Desktop/spawn-task-experiment/gulpfile.js
[22:09:51] Starting 'spawn'...
[22:09:51] Starting 'clean'...
[22:09:51] Finished 'clean' after 9.29 ms
[22:09:51] Starting 'parallel'...
[22:09:51] Starting 'spawn-task'...
[22:09:51] Starting 'spawn-task'...
[22:10:07] Finished 'spawn-task' after 16 s
[22:10:07] Finished 'spawn-task' after 16 s
[22:10:07] Finished 'parallel' after 16 s
[22:10:07] Finished 'spawn' after 16 s

Has anyone else explored this stuff?

--edit

Code looks too simple and results too good. Please point out any flaws :smile_cat:

heikki commented 9 years ago

Ping @noahgrant ^

yocontra commented 9 years ago

@heikki Try using worker pools instead of spinning a new one up each time you run the task. Startup times on new child_processes can be like 30ms, using a pool eliminates that

heikki commented 9 years ago

I tried that already but decided to show simpler way first. Leaving child processes alive have a side effect that parent doesn't exit. So maybe use worker pool only when running watch.

--edit

Exit problem happened in different setup where messaging was done via ipc channel. If child's ipc channel was left open then the parent didn't exit. I didn't find a way to reopen it after closing so it was a dead end.

heikki commented 9 years ago

Added worker pool example using worker-farm.

yocontra commented 9 years ago

@heikki Is there a module yet for just wrapping a function in a child_process pool? If not you should publish one, and we can play around with just

var makeItFaster = require('your-module');

gulp.task('whatever', makeItFaster(function(){

}));

before thinking about adding anything to core

heikki commented 9 years ago

Published with the same name -> spawn-task-experiment

yocontra commented 9 years ago

@heikki Cool stuff - I'll let people play around with it for a bit and you can build off their feedback :+1:

insidewhy commented 9 years ago

Implement support for multiple processes in the sigh asset pipeline and wrote a cool library for making dealing with process pools easy using a promise based API.

AlekseyMartynov commented 9 years ago

Synthetic example of paralleling dependent tasks with worker-farm: https://github.com/AlekseyMartynov/misc/tree/master/gulp-with-worker-farm

csvan commented 9 years ago

+1

yocontra commented 9 years ago

Is anyone using any of these solutions in their gulpfiles yet? Curious what people have working so far

bcherny commented 9 years ago

any progress on making this this default behavior? one gulp's biggest benefits is making parallel task execution the default, and it would be nice to run with this idea to make gulp even faster!

phated commented 9 years ago

@bcherny no one is working on this for gulp core. Please notice the "probably user land" label.

dbkaplun commented 9 years ago

I agree this should be implemented in userland, but I also think this should be implemented!

bcherny commented 9 years ago

what's the rationale for keeping this out of gulp-core? gulp is responsible for executing tasks in parallel, and it makes assumptions about how exactly to do that (in a single process, on a single core). why would it do it that way by default, rather than on multiple threads by default?

phated commented 9 years ago

@bcherny in addition to my comment in https://github.com/gulpjs/gulp/issues/1308#issuecomment-145987688:

Again, do it by hand or in userland

bcherny commented 9 years ago

thanks for the explanation! it's a good argument. i've seen at minimum 20-30s build for any large app i've worked on, so time to spool up a new thread is negligible for my use case. i bump into this problem often, and am surprised that there are so few existing solutions.

yocontra commented 9 years ago

@bcherny Just to be clear: we did not make a choice to run things in a single thread. This is how node and javascript works. It is a constraint of the language and environment, not a choice that we made.

bcherny commented 9 years ago

@contra not sure i understand. node offers a few apis to help orchestrate async tasks. among them are promises and setTimeout, but also child_process. i'm not sure that any one of those is the way "javascript works".

yocontra commented 9 years ago

@bcherny promises and setTimeout are still running on the same thread. I'm saying this: in JS all work happens on the main thread, gulp is a javascript library, therefor gulp has one thread. Your earlier posts made it seem like I made a choice that gulp should only be one thread. Just trying to clear it up for any future people who look at this post so nobody gets confused about where the limitation is from.

bcherny commented 9 years ago

this is a technicality, but to rephrase: node offers a bunch of apis for dealing with async tasks. among them are apis that run on a single thread, and apis that run on multiple threads.

javascript itself runs on a single thread, but both node and the browser offer async apis that work in multi-thread contexts.

gulp uses async apis to orchestrate async tasks. specifically, it uses orchestrator, which uses callbacks, promises and streams. the latter 2 are async apis that happen to run in a single thread. orchestrator could just as well have used child_process.

node is not biased toward promises, streams, or child_process.exec. they are all just node apis, and orchestrator made the choice to use only those apis that run in a single thread.

yocontra commented 9 years ago

@bcherny The only API that node provides to solve this problem is child_process, which is not a lightweight thread - it is a full node processes running completely separate.

No, orchestrator (which we don't use anymore btw) could not have "just as well have used child_process":

Running every function in a child_process would be a horrible default - the idea that gulp is idiomatic javascript would go out the window since it requires people to write code fundamentally different than they normally do.

tl;dr Running every task in a child_process is never going to happen in core since child_process is not the right abstraction (reasons listed above) - if you understand the tradeoffs and want to run your task in a child_process, use this tiny module https://github.com/heikki/spawn-task-experiment

This would have all been solved if node had finished the Isolate API (essentially lightweight threads w/ shared V8 memory) but they abandoned it in 0.9.x

yocontra commented 9 years ago

I'm going to close this since I think we've reached a solution (https://github.com/gulpjs/gulp/issues/317#issuecomment-149437415)

If node ever revisits the Isolate API or somebody makes a lightweight threading module that supports shared requires I will reopen this as a possibility, until then it will have to stay in userland.

sindresorhus commented 9 years ago

Spinning up a child_process takes 30ms

Usually more if you have a lot of heavy requires, and even slower on Windows.

Support for workers (lightweight processes backed by OS threads) might land in Node.js 5. That would be a better way to deal with it. https://github.com/nodejs/node/pull/2133

zhoujianlin8 commented 8 years ago

I have try to use cluster in gulp-ctool-browserify but it don't make it faster. it seems limit by memory and cpu who can help me

inikulin commented 8 years ago

Hi guys. I've made gulp plugin for the multiprocessing. In general it's faster than regular builds, but it's requires some manual tuning and may not give performance gain at all in some projects because of reasons described by @contra

kevin-smets commented 8 years ago

@inikulin, thanks for that plugin, works like a charm! Using it now in a massive project, which brought the total build time down from 3 minutes to 2 minutes by running jade, sass and coffee compilation as parallel tasks (with gulp-ll obviously).

Currently my Jade is taking the longest at 2 minutes, Splitting this into jade / jade1 / jade2 tasks with each their own blob brought the total build down to 50 seconds... (we're coming from 5m30s total with a previous setup, so that's just awesome). But that's a "living" config of blobs which is not very maintainable.

Plus I use UUID generators in my Jade mixins, which now, obviously are not guaranteed to be unique anymore, because these parallel tasks do not know of one another. Splitting tasks into workers is easy peasy using gulp-ll, but splitting a single task into multiple processes, haven't found a solution for that yet... Unless someone has any pointers for that? I'm guessing spawning child processes inside a single task will pretty much be the only way?

Currently the build uses 3 of 8 cores (jade takes one, the others take one physical core + one virtual), there is still so much untapped potential.

cghamburg commented 8 years ago

@inikulin This works great. Just by running stylus and es6 compile tasks in parallel using gulp-ll we went down from 1m42s to 47s!

import ll from 'gulp-ll';
ll.tasks(['css', 'js']);
gulp.task('default', done => {
    runSequence(['css', 'js'],'bundle');
});
kevincaradant commented 8 years ago

Is it possible to do a multithread on one task like uglify JS( so the same pipe but in parallel to realize this uglify ) without only one core at 100% ? With gulp 4.0, i can do on a multithread on two different tasks , with one task on sass files and an other on JS files.( uses 2 cores ) After i see your plugin @inikulin , that seems to be great , but i didn't found anything about the same task. Maybe i ask too much :/ i'm a little bit confused about this ...

vangorra commented 8 years ago

Nobody seems to have mentioned: https://www.npmjs.com/package/gulp-multi-process

We're using it to parallelize the 4 separate build tasks using webpack, typescript, babel and uglify. Works great.

yocontra commented 8 years ago

@vangorra Seeing how webpack is 100% synchronous and blocks the event loop for ~30 seconds sometimes, that sounds super useful. Thanks for the link.

strarsis commented 7 years ago

It would be also great if a stream (of vinyl files) could be distributed over a process with a particular node task on multiple cores. gulp-multi-process is great but it can be only used to distribute on task-level.

xkr47 commented 7 years ago

@strarsis I'm not exactly sure if this is what you asked for but.. I worked on a vinyl-parallel transform that transports vinyls (including content) to a different process, does whatever processing needed there and then transports the result back. See below urls for example; the example gulpfile contains tasks demontrating traditional use (tasks named sync-*) and use together vinyl-parallel (tasks named parallel-*). In the latter case the parallel work to be done is implemented in the gulpslave.js file.

The project is mostly finished but I ended up personally using gulp-ll instead in our project because we happened to have enough tasks that we could get decent built times using that. Feel free to report any bugs/improvement ideas you find.

https://github.com/NitorCreations/vinyl-parallel/blob/master/example/gulpfile.js https://github.com/NitorCreations/vinyl-parallel/blob/master/example/gulpslave.js