scratchfoundation / scratch-gui

Graphical User Interface for creating and running Scratch 3.0 projects.
https://scratchfoundation.github.io/scratch-gui/develop/
BSD 3-Clause "New" or "Revised" License
4.43k stars 3.51k forks source link

Cloud message throttling prevents two consecutive cloud var updates #4243

Open ArnoHue opened 5 years ago

ArnoHue commented 5 years ago

Expected Behavior

According to what I read in https://github.com/LLK/scratch-gui/pull/3916, cloud var updates should be throttled to a max of 10 per second. Actual implementation applies https://lodash.com/docs/4.17.11#throttle. But this discards every update within 100ms of the previous one. I would expect two consecutive cloud var updates to work, as long as there are not 10 within 1 second. A suitable approach might be a queue for cloud var updates, with 100ms delays between each dequeue.

Actual Behavior

Two of my projects (https://scratch.mit.edu/projects/97137611/, https://scratch.mit.edu/projects/209130998/) broke, as two consecutive cloud updates could not be sent, unless I introduced a wait(100) block in between. But wait()s might not be suitable in many scenarios, e.g. when this has an effect on timing. That is the case for the 2nd project (Chess). (other than affecting timing, there is another problem here by having 10 cloud variables in use, hence the project couldn't be saved any more in the Scratch 3.0 editor with the limit now being 8 vars - but this is just a sidenote).

Note that both projects send a lot less than 10 cloud updates / second in average. PacMan just 5 updates after many minutes of gameplay if a highscore is reached, Chess has around 15-20 updates per minute, depending on move frequency.

Steps to Reproduce

Project https://scratch.mit.edu/projects/97137611/, sprite Highscore, block SetHighscoreFromLocal. Highscore data is split to 5 cloud variables, highscores are only stored with wait() blocks between each update.

Operating System and Browser

Windows 10, Chrome 71.0.3578.98

ArnoHue commented 5 years ago

Maybe an even better option than a queue with delayed dequeue might be to simply keep track of the timestamps of previous cloud var updates, and only discard new updates if the previous 10 have occurred within 1 second or less.

McSinyx commented 5 years ago

As mentioned on the forum, this undocumented so-called feature caused confusion to many Scratchers (including me).

benjiwheeler commented 5 years ago

I also ran into this limitation. I agree that ideally, we would have an implementation that better balanced the need to limit constant updating of variables with the need to support moderate overall use that is prone to bursts of updates.

If a token bucket would not add too much overhead, that might address this problem, while allowing us to limit the update rate across multiple seconds more precisely than _.throttle does.

colbygk commented 5 years ago

I agree that limiting updates to no more than once every 100ms causes a complication in how code is written to avoid closely spaced updates from the client side. On the server side we use a token bucket based throttler and perhaps we could use it for the client side, see: https://www.npmjs.com/package/limiter .

I would advocate that we fuzz the request timings though, adding some random timing noise could help avoid pulses of requests piling into the backend service all at particular intervals of a particular second, given how many hosts are now slaved to within a few ms of GPS time these days.

McSinyx commented 5 years ago

Can we, on the client side, send updates in a package of values instead, in a fixed interval (only if new values are set of course)?

colbygk commented 5 years ago

I would push back and say that:

1) We have to be very careful in how data flows into the cloud data server to avoid causing too much congestion/interfering with the backend functions. 2) We do not guarantee updates to arrive at the cloud data server and we only make best effort.

Even the suggested solution of using a token based limiter might cause more confusion; given a simple loop in scratch:

token bucket: 10 updates could be sent within a 10ms window of time, while dropping every update after until at least another 990ms have passed. current throttler: new updates are guaranteed to be sent every 100ms.

I may back track on my token bucket suggestion, as it might be least surprising to support a best effort delivery of data every 100ms vs fast updates in a burst and then nothing for almost one full second.

benjiwheeler commented 5 years ago

@colbygk I think with token smoothing (one package called it "spread"), we could make sure that users never have to wait more than a certain amount of time (e.g., 100ms) before they have at least one token, and thus the ability to send another update. Or, we could just configure the token replenishment rate to be that minimum time, e.g. 100ms.

One consideration is that projects that make heavy use of cloud vars may have been configured to send updates precisely at the limit of what's supported; our changing what's supported might make them behave unexpectedly.

griffpatch commented 5 years ago

Does anyone have an answer to whether batching of cloud var update is at all possible? I would expect most users to set cloud var x, y and direction all in one go... Would these now be spread apart over 3 consecutive updates 100ms apart or batched under one update token?

On Tue, 22 Jan 2019, 17:54 Benjamin Wheeler <notifications@github.com wrote:

@colbygk https://github.com/colbygk I think with token smoothing (one package called it "spread"), we could make sure that users never have to wait more than a certain amount of time (e.g., 100ms) before they have at least one token, and thus the ability to send another update. Or, we could just configure the token replenishment rate to be that minimum time, e.g. 100ms.

One consideration is that projects that make heavy use of cloud vars may have been configured to send updates precisely at the limit of what's supported; our changing what's supported might make them behave unexpectedly.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/LLK/scratch-gui/issues/4243#issuecomment-456497251, or mute the thread https://github.com/notifications/unsubscribe-auth/AGbNvkuZS2zLpzRq_FexXM_TZoLtfw5Eks5vF1BvgaJpZM4Zodr0 .

benjiwheeler commented 5 years ago

@griffpatch Batching updates seems like it would be possible -- we'll have to look at a few different ways to implement it.

Generally, we're looking at a few possibilities with the intention of making the most common cases of cloud var use work well, without the total volume of requests being too high.

Hope that helps!

McSinyx commented 5 years ago

If we're willing to change how server receive data to update multiple vars at once--I suspect we currently cannot because of this piece of code (irony how it's supposed to be the warranty for the server to receive important methods, but the throttle sure prevent that from happenning--I wonder how #3916 got approved in the first place as it breaks current feature):

https://github.com/LLK/scratch-gui/blob/d63a7ad6250d235b5f9fb2a3a8ed66ab1ba9ebbc/src/lib/cloud-provider.js#L81-L83

then IMHO we could do sth like this

diff --git a/src/lib/cloud-provider.js b/src/lib/cloud-provider.js
index ce26082f..7ed39c01 100644
--- a/src/lib/cloud-provider.js
+++ b/src/lib/cloud-provider.js
@@ -1,5 +1,5 @@
 import log from './log.js';
-import throttle from 'lodash.throttle';
+import debounce from 'lodash.debounce';

 class CloudProvider {
@@ -24,12 +24,10 @@ class CloudProvider {
         // A queue of messages to send which were received before the
         // connection was ready
         this.queuedData = [];
+        this.lastUpdate = new Date().getTime();
+        this.isUpdating = False;

         this.openConnection();
-
-        // Send a message to the cloud server at a rate of no more
-        // than 10 messages/sec.
-        this.sendCloudData = throttle(this._sendCloudData, 100);
     }

     /**
@@ -76,11 +74,8 @@ class CloudProvider {
         this.writeToServer('handshake');
         log.info(`Successfully connected to clouddata server.`);

-        // Go through the queued data and send off messages that we weren't
-        // ready to send before
-        this.queuedData.forEach(data => {
-            this.sendCloudData(data);
-        });
+        // Send off queued messages that we weren't ready to send before
+        if (this.queuedData) this.sendCloudData();
         // Reset the queue
         this.queuedData = [];
     }
@@ -136,17 +131,12 @@ class CloudProvider {
         if (dataNewName) msg.new_name = dataNewName;

         // Optional number params need different undefined check
-        if (typeof dataValue !== 'undefined' && dataValue !== null) msg.value = dataValue;
-
-        const dataToWrite = JSON.stringify(msg);
-        if (this.connection && this.connection.readyState === WebSocket.OPEN) {
-            this.sendCloudData(dataToWrite);
-        } else if (msg.method === 'create' || msg.method === 'delete' || msg.method === 'rename') {
-            // Save data for sending when connection is open, iff the data
-            // is a create, rename, or  delete
-            this.queuedData.push(dataToWrite);
-        }
+        if (typeof dataValue !== 'undefined' && dataValue !== null)
+            msg.value = dataValue;

+        this.queuedData.push(JSON.stringify(msg));
+        if (this.connection && this.connection.readyState === WebSocket.OPEN)
+            this.sendCloudData();
     }

     /**
@@ -157,6 +147,23 @@ class CloudProvider {
         this.connection.send(`${data}\n`);
     }

+    /**
+     * Send queued data to the cloud server
+     * at a rate of no more than 10 messages/sec.
+     */
+    sendCloudData () {
+        if (this.isUpdating) return;
+        this.isUpdating = True;
+
+        const now = new Date.getTime();
+        debounce(() => {
+            this._sendCloudData(this.queuedData.join('\n'));
+            this.queuedData = [];
+            this.lastUpdate = now;
+            this.isUpdating = False;
+        }, this.lastUpdate + 100 - now)()
+    }
+
     /**
      * Provides an API for the VM's cloud IO device to create
      * a new cloud variable on the server.
benjiwheeler commented 5 years ago

@McSinyx Thanks for that suggestion.

Would that use of debounce effectively wait until a flurry of update requests stops before sending the request? If so, that might mean a tight loop in Scratch would keep the vars from ever being updated... like this:

debounce

(this is from the codepen examples on https://css-tricks.com/debouncing-throttling-explained-examples/ )

ArnoHue commented 5 years ago

But debounce(), just like throttle(), would unnecessarily discard cloud var updates, just as it is the case now, wouldn't it? Why not simply use a queue with a timeout between dequeue()s, as suggested in the original posting? Just as in https://stackoverflow.com/a/23073020 or http://jsfiddle.net/dandv/47cbj/ ?

colbygk commented 5 years ago

The issue with not discarding is that quickly aging values could be based on previous results read from the output of the cloudvars system. This would lead to individual scratch users over writing each other with data that is "old", which is probably not what we want either.

We've discussed locally that we could use two token buckets, one that works over short time scales, allowing a bursty behavior that would make a series of updates in quick succession viable. The other would work over longer time scales, so that if the longer term rate of updates exceeds a limit, the client throttles those.

limiter provides a burst and fill rate:

const BURST_RATE = 15; // 15/second burst
const FILL_RATE = 10; // 10/sec sustained
const TokenBucket = require('limiter').TokenBucket;
const bucket = new TokenBucket(BURST_RATE, FILL_RATE, 'second', null);

bucket.removeTokens(1, function() {
  sendMyData(myData);
});
benjiwheeler commented 5 years ago

Another thought: a common pattern is to update several vars in a row, but if the code is trying this too often, you could have a pattern where only the first var or two get updated (and very frequently), while the others get updated essentially never.

One solution might be to refill the bucket discretely (all at once), rather than continuously (what at least one package calls "spread").

Another solution might be to provide a separate bucket for each variable, and fill those separately. (There could even be a general purpose bucket as well, with an update going through if the variable's own bucket has a token, and falling back to the general purpose bucket if not.)

TheLogFather commented 5 years ago

I can't say I see much of an issue with simply lumping cloudvar updates on the client, limited to sending only the latest values every 0.1s or so for those that have changed over that time.

And have the server just disconnect any client that sends more than, say, 35 requests within a couple of seconds (and perhaps also even when the average over a longer period of time exceeds 15 or so?)

Yes, sending only latest values means there will be missed values when a project is changing a cloudvar in a tight loop (which is what the majority of Scratchers rather naively tend to do). –But I think the important point here is that the projects receiving the changes don't have a reliable way to capture rapid changes anyway (at least, not currently), so trying to send each and every change seems pretty pointless, really...

(And, yes, it's possible to write a non-Scratch client that could capture all updates sent from cloud – but I don't see that there's any requirement to have to support such a client...)

Similarly, clients can receive lumped updates (rather than receiving one for each changed cloudvar).

EDIT: Maybe worth referencing as a bit of a blast from the past: https://github.com/LLK/scratch-flash/issues/973 (Hard to believe it's well over 3 years ago...)

griffpatch commented 5 years ago

I agree. So to make things clear in pseudocode:

var CHANGED = []

Scratch Main Loop: Execute Scratch threads for current frame User changes a cloud variable - So record changed cloud variable name in hash set CHANGED[varName] = 'Y' End of Frame IF CHANGED.length > 0 AND time_since_last_cloud_post is > threshold (0.1 seconds) THEN Post batch of most recent changes to cloud vars (ignore multiple updates) Set time_since_last_cloud_post to now Clear CHANGED END Loop

So this ensures the latest changes are always posted, and that they are batched if possible. Also, it doesn't delay posting either if a change is made at regular intervals, delays are only introduced when the scratch user tries to send data faster than 0.1 seconds.

On Mon, 25 Mar 2019 at 08:58, Aka DadOfMrLog notifications@github.com wrote:

I can't say I see much of an issue with simply lumping cloudvar updates on the client, limited to sending only the latest values every 0.1s or so for those that have changed over that time.

And have the server just disconnect any client that sends more than, say, 35 requests within a couple of seconds (and perhaps also even when the average over a longer period of time exceeds 15 or so?)

Yes, sending only latest values means there will be missed values when a project is changing a cloudvar in a tight loop (which is what the majority of Scratchers rather naively tend to do). –But I think the important point here is that the projects receiving the changes don't have a reliable way to capture rapid changes anyway (at least, not currently), so trying to send each and every change seems pretty pointless, really...

(And, yes, it's possible to write a non-Scratch client that could capture all updates sent from cloud – but I don't see that there's any requirement to have to support such a client...)

Similarly, clients can receive lumped updates (rather than receiving one for each changed cloudvar).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/LLK/scratch-gui/issues/4243#issuecomment-476107155, or mute the thread https://github.com/notifications/unsubscribe-auth/AGbNvrgl0xDF2NPW_7kALyU_yzMO2uU2ks5vaI-jgaJpZM4Zodr0 .

benjiwheeler commented 2 years ago

cc @AlexisGoodfellow @colbygk This is one of the issues relating to cloud vars, that points to some options for how to make the client side throttling a better balance of capability and prevention of spam