Regarding the usage for distributed API / job system

Thanks for reaching out and for the great question. The question touches the core of the fairness problem and not just the FAIR library.

This is how I'd think about the problem based on the given details:

In most systems, it's a good idea to reject things as early as possible. Keeping the requests in a queue for a while and then rejecting them asynchronously seems like a bad experience so I agree that the API server is probably the way to go.
However, this assumes that you have a good signal about resource exhaustion at the API server side. For this, you have to figure out what resource you are trying to protect. For example, if the queue length or overflow is a signal, that's very easy to capture. On attempting to enqueue, if the queue rejects or is too long, you can report that as a failure to the library. If it's a deeper metric in your stack, you'll have to figure out how to expose it reliably to the API server.
Finally, you'd register every incoming request to check if it should be throttled using what you consider the "client ID" that you'd like to establish fairness between.

Your questions about distributed state management and auto-scaling are very interesting. Note that the library as it exists today does not support import or export state. That said, this is what I'd say about your question:

If your API server is behind a load balancer that randomly distributes the incoming requests to your API servers, you are most likely fine running this library locally on every server. It's always good to start simple and see how far you can get. In most cases, you'll see a degree of under-throttling as opposed to over-throttling which is usually better.
If you really want to try state sharing, running a throttling microservice with this library that your API service talks to could be one way to do it. Since running a Bloom filter is quite cheap, you'll need a much smaller fleet for this service than your API service fleet thus giving you better fidelity metrics. This does add bit of an overhead but if you can stand a per-AZ instance with a lightweight GRPC service, this could be manageable. I am actually considering adding that capability out of the box to this project at some point.
Auto scaling impact depends on how aggressively you are scaling. If it's a regular resource based auto scaling, I'd guess (again, without knowing too much about your traffic patterns) that you'll probably be fine with local throttling. You'll see a short period of under-throttling before things settle down as you get more data to your new instances.
The separate microservice approach will shine in case of auto scaling since you won't have to worry about the under-throttling period in that case.

Let me know if you have additional questions. Thank again!

satmihir / fair

Regarding the usage for distributed API / job system #2