Implement rate limiting to prevent a DOS situation when multiple robots are participants on a wavelet.

If multiple robots are participants on a wave, then it is possible for them to 
generate deltas at the maximum rate achievable by the system. An example would 
be if two echoy style robots where placed on the same wave.

The rate at which robots may generate deltas needs to be limited to a 
reasonable burst rate and sustained rate that allows normal operation but 
prevents a robot from flooding a wave or overloading the server.

For the passive api, robots are responding to deltas and could conceivable 
respond to every delta generated. The rate at which changes are fed to a robot 
should probably be limited to once every 500 milliseconds. When a user is 
typing, the delta rate would likely exceed that rate during bursts of typed 
charters. In order to limit the response rate, the event rate must also be 
limited. To do this when deltas are converted to events individual deltas 
should be merged when possible (e.g. merge several typed characters into a 
single DOCUMENT_CHANGED event).

Limiting robots to one event every 500 milliseconds will reduce the load on the 
server but won't prevent two robots from responding to each other endlessly. 
Detecting this specific situation may be problematic, especially when one of 
the robots is connected to a remote server. In order to mitigate this 
situation, the response rate (i.e. the percent of events to which the robot 
generates a response) needs to be monitored. If the robots response rate 
exceeds a threshold for a configured duration then the robot should be removed 
from the wavelet and the wavelet participants should be notified (perhaps with 
a blip). For example, if the robot responds to more than 90% of events within 2 
minutes or 70% of events withing 5 minutes, it would be dropped.

It is conceivable that spelly would exceed these limits for a fast typist that 
is also a bad speller, so it should also be possible to configure the limits on 
a per robot basis. Robots should also be able to ascertain the limits placed on 
them so they can self-censor/self-limit.

For the active and data api's, similar limits should be implemented. The burst 
rate can be managed by not sending the response sooner than 500 milliseconds 
after the last response. For managing sustained rates, if the request rate 
exceeds X per 2 minutes or Y per 5 minutes the robot should be cut off, where X 
and Y might be 216 and 420.

In all three API's an event/response should be added to the API that allows the 
server to notify the robot when it is about to exceed the speed limit.

The sustained rate threshold algorithm described is a simplistic one used for 
example purposes and something more robust should probably be considered.

Original issue reported on code.google.com by Tad.Glines@gmail.com on 3 Nov 2010 at 9:52

The passive API implementation is already somewhat self limiting. The robot 
gateway makes only one request to a robot per wavelet at a time, waiting for a 
response before sending more events. It batches up deltas and applies them all 
to generate a single event bundle for the next call. Human typing speed does 
not affect the robot-generated load at the high end. Network latency will 
likely reduce a robot's throughput to a sane level already. Throttling should 
be as simple as just waiting a bit longer before sending the next event set.

I agree this kind of mechanism is necessary in the long run, but in my opinion 
we needn't add this kind of complexity until the need is demonstrated. As far 
as I know, Google Wave never had a problem requiring this kind of throttling 
(but could of course support pretty high load).

Original comment by ano...@google.com on 4 Nov 2010 at 3:02

latos / wave-protocol

Implement rate limiting to prevent a DOS situation when multiple robots are participants on a wavelet. #140