Cannot invoke throttle recursively

jasongrout commented 4 years ago

In the throttler, we have a special check: https://github.com/jupyterlab/lumino/blob/5e8427c9769e8ba3c961ea62e22cae5e696f52ea/packages/polling/src/ratelimiter.ts#L130-L132

This (I think) means that the throttle callback cannot schedule a new callback by calling invoke recursively. For example, I might have a function that tries to do a lot of work (say process 100 items in an array) by breaking it up, doing a bit (like process 10 items), then scheduling a future call by invoking the throttle recursively and yielding back to the browser. I think this current logic of not scheduling a future call if we are currently being invoked means I cannot invoke recursively.

Of course, I can do a setTimeout(() => { x.invoke() }), but that seems like a weird workaround. Why do we have this check?

We ran into this in https://github.com/jupyter-widgets/ipywidgets/pull/2732#discussion_r376296757

afshin commented 4 years ago

@jasongrout:

When @blink1073 and I were originally reading up about rate limiting strategies, we decided to only implement two of them that did not involve keeping an internal queue.

The Throttler basically discards superfluous calls to guarantee no more than one invocation of your underlying function happens per limit and it always keeps the first invocation.
The Debouncer, instead, discards any outstanding invocations in favor of the last invocation, which means you can flood it and effectively cause a denial of service (which might be what you want, for example, when you are handling mouse movements and only want to act when the movement has ended).

For your circumstance, would this pattern work?

const queue = [/*...*/];
const throttler = new Throttler(async () => {
  // Do something with x items in queue and shorten the queue.
});

while (queue.length) {
  await throttler.invoke();
}

cc: @wolfv

wolfv commented 4 years ago

When I looked into this, the debouncer basically implemented the "fix" that I was envisioning (not checking for the current invocation state). So I believe that debounce would have been appropriate from the beginning – and indeed it seems to work well.

jasongrout commented 4 years ago

@wolfv - like @afshin mentions, the debouncer can effectively have a denial of service if it is called regularly, which is not the behavior we want in our case, right? We want things to be regularly retried, even if new requests are made.

@afshin - I was trying to simplify our case for the sake of an example. Your solution works well if we have a single queue that does not change size. In our real example, the queue can be extended at arbitrary times, and we're implementing a timeout on items, so they stay in the queue until their timeout is reached (so we don't know a priori when the queue size will decrease). Basically, the behavior we would like is to continuously retry a queue of requests until each request has a timeout. The throttle is to batch up the retries so we retry multiple times before the timeouts.

For example, we might have a queue of 5 requests, each with a timeout of 500ms, started at different times, but all within 300ms of each other. We'd like to retry any existing requests at most every 100ms, and the queue size decreases as each individual request exceeds its timeout or succeeds. A way to implement this is to make the batched retry a throttled function, and in the invocation of a retry, we'll know if we need to retry again, so recursively call the invocation.

Driving the invocations from an external loop, like in your suggestion, gets messier when the queue dynamically grows in size. When we add an item to the queue, we should kick off that async loop to do the queue work. Either we can blindly kick it off, in which case we have a lot of superfluous requests (which, fair enough, will be consolidated by the throttler, but still seems messy), or we can keep track of whether that async driver loop is currently running. I thought it would be simpler to just invoke a throttled retry function (and let the throttler take care of the consolidation I was going for) which recursively schedules more retries if needed, but the recursive part doesn't work with the current implementation.

The Throttler basically discards superfluous calls to guarantee no more than one invocation of your underlying function happens per limit and it always keeps the first invocation.

I think I see the logic now. Thinking of a throttled network request (a simplified case where we don't have a recursive call), if someone invokes the function while the function is in middle of a network request (i.e., the throttle is in "invoked" state awaiting the result of an existing network request), the current implementation assumes that the existing invocation is sufficient and ignores the new invocation. This is true in the case where the work pattern is that the interesting work happens after the network request, so that the existing current invocation is the same as the new one requested.

However, consider a case where interesting work happens in the function before a network request is made. For example, suppose the request is made to a dynamically determined URL. When the request URL changes, we want to make a new request to the new URL, so we invoke the function. If we happen to invoke while an existing request is pending out to the current URL (i.e., the status is "invoked"), we don't end up scheduling an invocation to the new URL, instead, our invocation that would pick up the URL is dropped. I think this is unexpected in this case - I would have expected the logic to be more like "if there is a scheduled invocation that starts in the future, drop the request" rather than "if there is a scheduled invocation that ends in the future, drop the request". I suppose it's sort of a leading edge vs trailing edge decision criteria.

jasongrout commented 4 years ago

In the JLab dev meeting today, we discussed this. A couple of notes:

@afshin pointed out that we can access the state of the throttle, so we can work around this issue by checking to see if the state is currently 'invoked', and if it is, then doing throttled.invoke.then(() => throttle.invoke()) or something.
@saulshanabrook pointed out the rxjs throttling api. There was some confusion about what the leading and trailing parameters are (they aren't documented in rxjs, at least that we could see). The commit adding them to rxjs pointed to lodash, which does document them. Finally, the Lodash docs point to an excellent article talking about the parameters. It's not clear to me how these parameters translate to our situation where the function invocation can be asynchronous and delays the state changes and waits.

I think for now, we'll use the workaround @afshin suggested, and so close this issue. If it comes up again, we can take another look at extending the throttler.

jasongrout commented 4 years ago

@afshin - you mentioned that we should be able to do that workaround because we could access the underlying poll. Except we can't because the poll is protected: https://github.com/jupyterlab/lumino/blob/3b93496b80e1e681af112e6184ef5791c5ad06fc/packages/polling/src/ratelimiter.ts#L95

So I'll need to create a subclass for this. Not a huge deal, but it's a bit more pain than we thought.

afshin commented 4 years ago

@jasongrout Ah, that is not so great. I don't know that the poll should be exposed, but the phase maybe should be. What do you think of this?

diff --git a/packages/polling/src/index.ts b/packages/polling/src/index.ts
index 68b1357d..ac0760e3 100644
--- a/packages/polling/src/index.ts
+++ b/packages/polling/src/index.ts
@@ -166,6 +166,11 @@ export interface IRateLimiter<T = any, U = any> extends IDisposable {
    */
   readonly limit: number;

+  /**
+   * The current phase of the underlying rate limiter poll.
+   */
+  readonly phase: IPoll.Phase<'invoked'>;
+
   /**
    * Invoke the rate limited function.
    */
diff --git a/packages/polling/src/ratelimiter.ts b/packages/polling/src/ratelimiter.ts
index 0e7ea20c..f121a2f4 100644
--- a/packages/polling/src/ratelimiter.ts
+++ b/packages/polling/src/ratelimiter.ts
@@ -3,7 +3,7 @@

 import { PromiseDelegate } from '@lumino/coreutils';

-import { IRateLimiter } from './index';
+import { IPoll, IRateLimiter } from './index';

 import { Poll } from './poll';

@@ -56,6 +56,13 @@ export abstract class RateLimiter<T, U> implements IRateLimiter<T, U> {
     return this.payload === null;
   }

+  /**
+   * The current phase of the underlying rate limiter poll.
+   */
+  get phase(): IPoll.Phase<'invoked'> {
+    return this.poll.state.phase;
+  }
+
   /**
    * Disposes the rate limiter.
    */

jasongrout commented 4 years ago

What do you think of this?

Let me think about it. For now I just made a subclass, but overriding it made me want to think about the logic of the throttler a little more too.

jupyterlab / lumino

Cannot invoke throttle recursively #47