Debouncing watcher tasks with Gulp 4

ide commented 9 years ago

With Chokidar there's no debounce option from what I've seen and instead it's up to the user of Chokidar to pass in a debounced function. So currently I do this:

gulp.watch('**/*', _.debounce(compile, 200));

Because lodash doesn't copy the provided function's name, my Gulp log says, "[03:06:06] Starting 'debounced'..." instead of "compile". The more serious issue is that the logs never say "Finished 'debounced'..." because the done callback that gulp.watch passed to the first invocation of the debounced function is ignored.

So what I'm looking for is an easy and reliable way to debounce gulp.watch callbacks, possibly by adding debounce support to gulp.watch itself via the options arg.

mmoss commented 9 years ago

:+1: This is something driving me crazy that my team members have run into as well... If your editor saves a file too many times... ugh.

TrySound commented 9 years ago

/cc @es128

es128 commented 9 years ago

Are we referring to the debounceDelay option in gaze? I think that was a workaround for duplicate events (caused by editor or OS wonkiness). Chokidar tries to detect and suppress duplicate events without asking users to fiddle with settings, and considering it provides only an EventEmitter interface, it doesn't make sense to me to add denounce functionality there.

The callback interface is provided directly by gulp now, so it's up to the gulp maintainers to decide whether they want to add debounce as a convenience option for running tasks.

es128 commented 9 years ago

What I'd expect to be much more useful than a time based debounce, however, is something that would know how to abort/restart a task upon subsequent events, or something that queues it up to run one more time after finished if more triggering events have been received while it was running.

ide commented 9 years ago

@es128 Agreed that this option likely doesn't belong in Chokidar. It can be implemented in Gulp or a helper library.

My use case is that I have three jobs that Gulp runs: Babel, Webpack, and a Koa server. When there's a change in my source files, gulp.watch tells Babel to run (Webpack also independently has its own watcher), which produces new files in my build directory. Another gulp.watch command notices the newly built files and restarts the Koa server. The issue is that Babel and/or Webpack sometimes output multiple files within a short time span and the Koa server restarts several times.

So I see two solutions for my project:

Debounce the gulp.watch handler that restarts Koa. This needs to debounce across multiple files (e.g. if Webpack writes a.js, b.js, and c.js, Koa should restart just once or twice depending on the policy for debouncing.)
Remove the second gulp.watch call that restarts Koa. Instead the first gulp.watch call that looks at my source directory would invoke Babel, wait for Babel and/or Webpack to complete, and then restart Koa. This is actually somewhat inconvenient to write because of how Webpack works -- it uses its own watcher.

Here's a small proposal for a Gulp-aware debounce function:

function gulpDebounce(fn) {
  let timeout;
  let debounced = function(done) {
    let context = this;
    if (timeout != null) {
      clearTimeout(timeout);
      done();  // Need to tell Gulp not to wait
    }
    timeout = setTimeout(function() {
      timeout = null;
      fn.call(context, done);
    }, wait);
  };
  debounced.displayName = `debounced(${fn.displayName || fn.name || 'anonymous'})`;
}

(Don't actually use this... it doesn't work with async functions or promises.)

something that queues it up to run one more time after finished if more triggering events have been received while it was running.

This sounds very reasonable and more thoughtful than a time-based debounce. Maybe we should try that.

phated commented 9 years ago

@es128 We don't really have the ability to stop things on a whim, as gulp 4 is purely utilizing function composition. What constitutes a duplicate event for chokidar?

I'd prefer not to build a debounce into core, as we are supposed to be a slim facade over chokidar

es128 commented 9 years ago

We don't really have the ability to stop things on a whim

Understood, but I meant that from a within reason or best effort type of perspective. If there are steps remaining in a chain, there's probably some opportunity to set a flag that switches out the rest to noops or something. Idk, it's a bit of a half-baked idea.

What constitutes a duplicate event for chokidar?

Matching change events within 50ms, or matching unlink events within 100ms. By "matching" I mean on the exact same path, but something on a per-listener basis (in case of events from different paths within a glob, for instance) would better meet the need described in this issue.

phated commented 9 years ago

@es128 I had a thought about this last week and I just remembered it now. I really like the idea of chokidar handling duplicate events implicitly, but I think it should be expanded now that you have glob support. A duplicate event is if the path matches the glob that produced it, this would handle both use cases. What do you think?

es128 commented 9 years ago

@phated Chokidar's events are designed around the idea that you care about which file changed, not just that there was a change somewhere in the paths you're watching.

I think what you're proposing makes sense for a callback interface, which is what gulp is converting chokidar's EventEmitter interface into. So imo it makes sense to either add a simple throttle option that's handled here in gulp, or to extend chokidar's API to start providing the callback interface along with the configurable option (and thus simplify gulp's integration even further by exposing the new API style directly).

I'll also reiterate that I think it would be a much better feature if it understood the state of the task that's being run rather than a time-based throttle (i.e. queue up another run if an event is received while a task is in progress). But I suppose that doesn't have much impact on this decision; nothing stopping that from happening eventually even if the callback interface and throttle options end up being done in chokidar.

callumacrae commented 8 years ago

:frowning: :frowning: :frowning:

Whether it's done in gulp or chokidar, this would be a really nice change to have!

phated commented 8 years ago

I really don't think we should do this in gulp. I understand the reasoning behind chokidar not wanting to do this but it seems like it belongs in there (since we just defer to it). We could also ship a userland module that makes debouncing nicer/easier. Idk...

ide commented 8 years ago

We could also ship a userland module that makes debouncing nicer/easier.

I believe this is the best choice. @es128 made a good point about having the debounce logic be aware of tasks -- if we received filesystem events while running a task that cares about those files, we could buffer up those filesystem events and handle them later by re-running the task. Chokidar just doesn't have that information. Maybe there's some pitfall here that I don't see right now, but my current thinking is that the debouncing logic could be enhanced by making it aware of tasks.

Aside: I don't think this is a blocker for Gulp 4 in case I'm giving anyone that impression, especially if this can be done in user space.

es128 commented 8 years ago

I've been thinking about creating an interface that offers the callback-style API with options for both time-based debounce and something like a done() callback - either directly in chokidar or a new wrapper module.

phated commented 8 years ago

@es128 that would be awesome

waterfoul commented 8 years ago

What about just adding a gulp.debounced method with the same interface as parallel/series? The watchers already require you to use gulp.parallel or gulp.series anyway. That way users can just use that method and it doesn't even have to get implemented within watch itself.

waterfoul commented 8 years ago

I tinkered with it a bit, this is not an ideal solution by any means but it's a start. I wrapped gulp.series and added both a short delay (0.1s) and a block which only lets the job run once. If you guys integrated this the delay should probably be an option. I would also like to exclude the debounced function from the output as it displays many times. It looks like something needs to be done near line 14 of log/events.js to clean up the log

    function debounced () {
        var rerun = false;
        var running = false;
        var seriesFn = gulp.series(arguments);
        return function debounced (cb) {
            if (!running) {
                running = true;
                rerun = false;
                setTimeout(function () {
                    return seriesFn(function debounceCallback () {
                        running = false;
                        if (rerun) {
                            seriesFn();
                        }
                        cb();
                    });
                }, 100);
            } else {
                rerun = true;
                cb();
            }
        };
    }

phated commented 8 years ago

@waterfoul this probably isn't going in core. Everything in gulp 4 is about function composition and a debounced function is just that. People are already used to bringing in dependencies for gulp plugins, so it isn't any different by bringing in a debounce method.

waterfoul commented 8 years ago

would you be open to the idea of somehow hiding the "started" log for a particular function? Unfortunately that log is in core and there is no way to hide it. Currently the "Starting debounced" lines make the logs mostly useless in certain situations

elynnaie commented 8 years ago

@waterfoul I modified your idea slightly. I believe there are two improvements here:

You can pass in the arguments to the debounced function for greater reusability.
The started flag prevents it from rerunning unnecessarily if multiple files get changed in rapid succession (such as git checkout a wildly different branch).

    function debounced (args, delay) {
        var rerun = false;
        var running = false;
        var started = false;
        if (delay == null) {
            delay = 100;
        }
        var seriesFn = gulp.series(args);
        return function debounced (cb) {
            if (!running) {
                started = false;
                running = true;
                rerun = false;
                setTimeout(function () {
                    started = true;
                    return seriesFn(function debounceCallback () {
                        running = false;
                        if (rerun) {
                            seriesFn();
                        }
                        cb();
                    });
                }, delay);
            } else {
                if (started) {
                    rerun = true;
                    cb();
                }
            }
        };
    }

Edit: Might as well add delay as an optional parameter.

krnlde commented 8 years ago

David Walsh has a really nice debounce implementation, where you are allowed to set whether the debounce should be executed immediatelly and then wait, or the other way around.

function debounce(func, wait, immediate) {
    var timeout;
    return function() {
        var context = this, args = arguments;
        var later = function() {
            timeout = null;
            if (!immediate) func.apply(context, args);
        };
        var callNow = immediate && !timeout;
        clearTimeout(timeout);
        timeout = setTimeout(later, wait);
        if (callNow) func.apply(context, args);
    };
};

Lodash achieves the same via the options parameter {leading: true | false}

es128 commented 8 years ago

In this context why wouldn't you always want to start immediately?

krnlde commented 8 years ago

In case of browserSync you would miss the last refresh trigger, which is the most important one when compiling stuff like coffeescript, less or sass.

metalim commented 8 years ago

~~Regarding David Walsh implementation: isn't immediately executed debounce is called throttle?~~ Nevermind, I see the difference from visual representation here

callumacrae commented 8 years ago

I checked out an old branch, ran git rebase origin/master, and got 2632 lines of output over the next four minutes: https://gist.github.com/callumacrae/46c4990fa37ae00c51f8

I'm not sure where the errors came from, at no point were there ever any errors in master.

This feels like something where gulp-plumber, where it'll turn into a plugin that everyone uses and there will be hundreds of support requests about. It's just not a nice experience.

joerideg commented 8 years ago

Hi all, Just tried https://www.npmjs.com/package/debounce And it seems to work just fine including logging the function name!

var debounce = require('debounce');

function watch() {
    gulp.watch([
      cfg.src.scripts,
      cfg.src.templates,
      cfg.src.unitTests
    ], debounce(gulp.series(scripts, unitTests), 200));
  }

It outputs the following in the terminal if I save a html js and test file at the same time:

[15:44:10] Starting 'debounced'...
[15:44:10] Starting 'scripts'...
[15:44:10] Starting 'lint'...
[15:44:10] Finished 'lint' after 75 ms
[15:44:10] Starting 'transpile'...
[15:44:10] Finished 'transpile' after 268 ms
[15:44:10] Starting 'annotate'...
[15:44:10] Finished 'annotate' after 42 ms
[15:44:10] Starting 'html2js'...
[15:44:10] Finished 'html2js' after 19 ms
[15:44:10] Finished 'scripts' after 405 ms
[15:44:10] Starting 'unitTests'...
[I'll spare you the details of the unit tests]
[15:44:11] Finished 'unitTests' after 990 ms
[15:44:11] Finished 'debounced' after 1.68 s

Another example of how awesome gulp works together with node modules.

krnlde commented 8 years ago

This works because this library invokes the callback on the trailing edge of the timeout. As in the implementation from David Walsh and lodash (by default).

joerideg commented 8 years ago

Sure, my point was more that it just works, no need to implement it in gulp or a gulp-plugin then right?

krnlde commented 8 years ago

No need, but convinient. But arguably one might prefer his own implementation of debounce.

mmoss commented 8 years ago

This used to be part of the old Gulp API. I'm hoping folks update the docs accordingly to point that out, since this means there's not parity in functionality (built-in). I don't mind the idea of this being in userland, but it was a very useful feature of the previous watch, even if it was originally a workaround for an issue.

yocontra commented 8 years ago

@es128 Are you open to adding this to chokidar?

es128 commented 8 years ago

More likely as a new wrapper module than directly in chokidar. It could handle the callback interface that is currently being done within gulp's code, further simplifying the footprint of the integration here, in addition to providing a debounce utility option and perhaps other convenience features over time. I would want to explore the idea of the done() callback as I think it would provide a superior solution to the problem discussed here than time-based debounce.

I could put this together over the next couple weeks if nobody else wants to jump in to do it quicker than that.

phated commented 8 years ago

@es128 do you think that should be the next evolution of glob-watcher? Funny that we might have come full-circle on that module.

yocontra commented 8 years ago

@phated I think we should just do it in gulp, it's such a small amount of code breaking it into a module seems like overkill just to add one option

adamreisnz commented 8 years ago

Whether it's done in Chokidar or Gulp, or whether we have to use the debounce package manually, it would be great if an agreed path forward can be decided on, so we can prevent this horribleness:

Using debounce is a little better, but still ugly:

:stuck_out_tongue:

I agree with @contra, +1 for doing it in gulp. It seems small enough and it's something you would expect to be handled out of the box by gulp.watch, given how the use case for it in 99% of the time is watching your code for changes. Doing a project wide find & replace or pulling a lot of changes in from a repo are quite common use cases.

adamreisnz commented 8 years ago

In addition I might add that a shortcoming of using debounce to wrap your functions with leads to obfuscated console output, and you don't see the actual task that is being performed:

I guess you could write a wrapper function to wrap your task and the debounce function in, but that feels a bit hacky.

phated commented 8 years ago

@es128 how close is https://github.com/gulpjs/glob-watcher/blob/chokidar/index.js to your thoughts on this? I added async-done around the watch task and track running and queued to know when to run something. There's still a lot of work to do but I wanted to get some thoughts.

es128 commented 8 years ago

@phated haven't tried running it, but LGTM just reading the code. It doesn't actually implement the debounce folks are asking for, but as I've said I think this is a far better approach for the use case they've described.

So there's still a choice to make regarding implementing a more literal debounce option - for the sake of backward compatibility or meeting some users' expectations or whatever - in case this simple one-at-a-time approach doesn't cut it for some people. Perhaps just wait and see whether the demand is still there after this solution goes in.

Also, should there be an option to disable this behavior in case someone wants to allow concurrent instances of the same task?

adamreisnz commented 8 years ago

@phated the problem that I see with this approach, is that it still doesn't address the (common) use case where you want a certain task to run after all the file changes have been completed. Unless I'm misreading the code, it seems that this will start running the task on the first change, and then hold off running tasks on further changes. That's still not going to cut it for a work flow where it's common that many files change at once, and you just want to relint/retest/rebuild your project after all of the changes have been processed.

BrowserSync has a similar problem that I'm trying to get them to address; they also start the refresh on the first file change event, and debounce subsequent events only. However, in the end this still results in BS having to refresh at least twice: first on the first file change event, and then later again when the last file change is done.

So ideally, I think gulp.watch needs a debounce mechanism which we can turn on (or off if we don't want to use it) which will wait for the debounce period after receiving the first file change event before doing anything, and only if it doesn't receive any subsequent file changes, then run the task. If it receives subsequent file changes within the debounce period, it will keep on waiting. In other words, trigger the task on the trailing edge of the wait interval, as also described here https://www.npmjs.com/package/debounce#debouncefn-wait--immediate--false-

Hope this helps.

es128 commented 8 years ago

will start running the task on the first change, and then hold off running tasks on further changes

It will run it again (once) if more changes were detected during the initial run. So the only UX inefficiency is the delta between the last event and the end of the prior run of the task. Agreed that this can be problematic in the case of very long-running tasks, but should work out fine otherwise.

phated commented 8 years ago

@adambuczynski that is a pretty impossible thing to handle transparently and I'd really like to avoid people having to specify a bunch of options to make this work. I think the queueing mechanism will be fine to handle this use case (the last file will trigger a build which will either run or be queued)

adamreisnz commented 8 years ago

It will run it again (once) if more changes were detected during the initial run. So the only UX inefficiency is the delta between the last event and the end of the prior run of the task. Agreed that this can be problematic in the case of very long-running tasks, but should work out fine otherwise.

Yes that's correct. The problem is propagated however when you have a build task that generates build files, and then have another process (like browser sync) trying to detect those changes in order to do the refresh. That means that if gulp detects changes twice and runs a particular build task twice, browser sync will also refresh twice. This is more than just a minor inefficiency in my opinion, as the browser refresh can take a long time if you're loading in a lot of assets (for example 200 non concatenated JS scripts in a large app).

For other tasks that trigger a deployment this could be even more problematic.

that is a pretty impossible thing to handle transparently and I'd really like to avoid people having to specify a bunch of options to make this work. I think the queueing mechanism will be fine to handle this use case (the last file will trigger a build which will either run or be queued)

There don't have to be many options to make this work, all it takes is two settings: the wait interval, and whether to start immediately or not (e.g. leading edge or trailing edge of the interval).

But I understand you are trying to make this into a transparent and easy to use behaviour, and I appreciate that. There is also no need to reinvent the wheel, and right now, wrapping my task functions in the debounce function gives me the desired results, but the problem is that gulp as a result is naming the task "debounced", see previous comment https://github.com/gulpjs/gulp/issues/1304#issuecomment-175987459

If there is anything that can be done about that (without having to create a layer of function wrappers), then I am happy to just keep using the debounce package on top of gulp.watch.

phated commented 8 years ago

@adambuczynski if you set the displayName property on the returned debounce function, it should use that as the task name. However, if we implement the linked/suggested transparent queueing mechanism, does that screw up your usage of debounce?

adamreisnz commented 8 years ago

@phated Right, I will try setting that property.

It might actually mess it up. The first file change event will trigger the debounced function, but then Gulp would queue the subsequent events until the task is done. So the debounced function will wait, not get any further events, and then eventually run it's task. Once that's done, Gulp will carry on with the queue and run the debounced function again as a result. This would lead in the task running at least twice as well.

I think it's an inherent problem of starting the task immediately, rather than at the end of the wait interval.

So is the conclusion that we should just use the external debounce library and not handle debouncing or queueing in gulp? If so, I think that it should be properly documented, because many people would run into this issue if they are saving or changing multiple files.

phated commented 8 years ago

@adambuczynski I think we could set a wait interval at an extremely low number and trigger this flow with it. Instant changes across a variety of files would be batched at something like 100ms and that could be exposed as an option to someone that has a flow like you where things might not happen that quickly. I need to think on this some more.

adamreisnz commented 8 years ago

@phated Yes, as long as the wait interval is smaller than my debounce interval it should still work, provided you don't wait for the first task to complete.

I have tested setting the displayName property and that works well. Will play around with it some more and see if there's still any issues around the debouncing.

adamreisnz commented 8 years ago

Ok I've done some more testing, and the only problems I ran into were the aforementioned logging of the debounced task, which is still seeing the multiple triggers (perhaps that can be solved with your queue solution):

In addition, when I directly wrap a single task in the debounce function, for some reason it never seems to signal task completion back to gulp, and as such I never see a message that the task has finished, resulting in:

However, that can be solved by wrapping the task in a gulp.series (which also fixes the function naming problem):

So to summarize, my current working debounce implementation for watch tasks is as follows:

module.exports = function watchAppCss() {
  gulp.watch(APP_CSS_SRC, debounce(
    gulp.series(buildAppCss), WATCH_DEBOUNCE_DELAY
  ));
};

kingmotley commented 8 years ago

This probably needs to be updated for gulp 4 (and many other scenarios), but I've been using https://www.npmjs.com/package/gulp-debounce-stream which I wrote to solve most of my issues in gulp 3. It has options for triggering on the leading or trailing events (or both), and the debounce time. But that was more for doing incremental type builds rather than a single large build that gets called.

hiroqn commented 8 years ago

so if we need debounce function or some sensitive task operating, should i use chokidar directly without gulp.watch?

phated commented 8 years ago

Switched this to "help wanted" as more work needs to be done on my glob-watcher branch. Any help is very much welcome and appreciated.

phated commented 8 years ago

@adambuczynski or anyone else, could you test out https://github.com/gulpjs/gulp/tree/glob-watcher-2 without any userland debounce (I've added one internal to glob-watcher) in your projects and let me know the results.

I'd also love some real-world examples to be testing myself during development, if anyone has something that is open source.

gulpjs / gulp

Debouncing watcher tasks with Gulp 4 #1304