youtube / doorman

Doorman: Global Distributed Client Side Rate Limiting.
Apache License 2.0
1.91k stars 108 forks source link

Doorman

Build Status

Doorman is a solution for Global Distributed Client Side Rate Limiting. Clients that talk to a shared resource (such as a database, a gRPC service, a RESTful API, or whatever) can use Doorman to voluntarily limit their use (usually in requests per second) of the resource. Doorman is written in Go and uses gRPC as its communication protocol. For some high-availability features it needs a distributed lock manager. We currently support etcd, but it should be relatively simple to make it use Zookeeper instead.

Getting started

The purpose of Doorman is to apportion and distribute capacity to clients based on some definition of fairness. The capacity a client gets for a resource depends on four things:

The Doorman master server remembers all clients that currently have capacity and whenever a client asks for capacity it inserts the clients request into its memory and runs the algorithm to figure out what this client should get.

Lease length and refresh interval

Doorman only gives out capacity for a limited amount of time, in the form of leases. Each capacity grant comes with a lease length: The client is guaranteed that amount of capacity for the duration of the lease. A typical lease length is five minutes. On top of the lease length the Doorman server also returns a refresh interval. This is the interval after which the client is expected to check back in to get a new lease. A typical refresh interval is five seconds.

Note: The Doorman system is cooperative. The clients are expected to honour the capacity grant, the lease length, and the refresh interval. The system provides no protection against misbehaving clients.

In the normal operation of the system the clients all check in regularly with the server to refresh their capacity. The server knows of all clients and their resource needs, and on every request makes the best possible apportionment of the capacity. For optimization purposes (reduce qps on the Doorman server) the client code does bulk refreshes for all resources whenever it sends out a request to the Doorman server. This means that under specific circumstances (for instance when registering a new resource) a resource might get its capacity refreshed a bit sooner than expected.

The Doorman configuration specifies which algorithm should be used to distribute capacity among all clients. The page on algorithms explains which algorithms currently are available and how they apportion capacity to each client.

The two parameters lease_length and refresh_interval optimize a number of different behaviors of the system:

When the Doorman server goes unreachable and comes back

When a client cannot reach the Doorman server the following happens:

Note: Doorman uses multiple servers in different clusters and a master election procedure to determine the current master.

Doorman does not share or store its internal database. That means that when a Doorman server becomes the master it starts with an empty repository of clients and outstanding leases. However this is not as problematic as it seems, because once the server is available all clients will start calling it to refresh their leases. Since the Doorman server knows that it does not have enough information to run its algorithms it will simply return the currently assigned capacity, or zero if it is a request from a client which currently does not have capacity for the resource. The server knows the currently assigned capacity because clients helpfully include it in every GetCapacity RPC. This phase of the server is known as learning mode.

During the learning mode of a resource every request will be answered with a new lease for the same capacity the client currently has. Practically speaking after a couple of refresh intervals the server can be reasonably sure that it has been contacted by every existing client out there. However for reasons of safety the default learning mode duration is the same as the lease length. This decision ensures that when the learning mode duration expires we can be sure that there are no leases out there that we don't know about (because these would have expired by then). If you want the system to converge faster after a Doorman master election you can explicitly configure a learning_mode_duration in the resource template (see the page on the Configuration of the system for more information).

Who wants what?

Doorman requires the clients to inform it of the desired capacity (the so-called wants). If you are using the low-level Doorman client you need to figure out your capacity need and call the appropriate methods to make sure that the client library requests that amount of capacity during its refresh cycle. However if you use the rate limiter objects provided by the Doorman clients the desired capacity is determined automatically by observing the behavior of the threads that want to access the resource. This automatic wants determination uses a moving average to smoothen out any spikes.

Next steps

Status and Plans

Doorman should be currently considered Alpha quality software. The server and Go client received a decent amount of testing at Google (both functional and load testing), so we are pretty confident they do what they are supposed to do. However, in the process of open-sourcing the code we switched from internal Google technologies to their Open Source equivalents – and this needs more testing. Finally, there's no proper versioning scheme at the moment.

Short term plans:

Longer term plans:

Installation

First, you need to have Go installed. You can either follow the official installation instructions or, on OS X, just do

brew install go

As part of the initial setup, you have to set GOPATH, wihch is the location where Go keeps all its sources and binary artifacts.

export GOPATH=...

With this out of the way, Doorman is just one go get away:

go get github.com/youtube/doorman/go/cmd/doorman

If you are interested in a checkout of Doorman that you can modify, you can do:

mkdir -p $GOPATH/src/github.com/youtube
git clone git@github.com:youtube/doorman.git

Go version <= 1.5

If you are using a version of Go earlier than 1.6, you will need to set an environment variable to enable vendoring (see https://golang.org/s/go15vendor):

export GO15VENDOREXPERIMENT=1

Contributing

See CONTRIBUTING for details on submitting patches and the contribution workflow.

License

Doorman is under the Apache 2.0 license. See the LICENSE file for details.

Note

This is not an official Google product.