Time-windows for Remaining?

ucarion commented 3 years ago

Some systems rely on rate-limit headers to estimate how long it will take for them to complete some unit of work, where that unit of work requires some amount of quota against an API. Knowing this time-to-completion is useful because:

It lets clients give the user an estimated time remaining, and
Clients can use time-to-completion to determine if a job can be accepted. A client can reject jobs in order to implement load-shedding or to implement rate-limiting against their own clients, for instance.

Ideally, rate-limit headers allow a stateless API client to get the answer to:

How long is it going to take for me to get N units of quota, assuming I'm using quota as quickly as possible?

In the simple world where there's only one time-window for rate-limits, that answer is pretty easy to calculate. It's basically something like this:

def wait_time(headers, n):
    if n < headers.remaining:
        return 0

    n_after_reset = max(0, n - (headers.remaining + headers.limit))
    return headers.reset + headers.window * ceil(n_after_reset / headers.limit)

(With this in hand, the client knows it can put off doing any work for a job for wait_time(headers, 1), and it can expect to complete the job within wait_time(headers, N), where N is the quota required to do the job.)

But when there are multiple windows in play, I don't think the I-D's current definitions of the headers give the client enough information to know how long they need to wait. For the very simplest case where N is always 1, then the answer can only ever be zero or Reset, but for all other cases the clients needs to know not just its Remaining for the quota-policy with the least remaining quota, but also its Remaining for all other quota policies.

From this, I think it'd be good if the I-D explicitly took one of two stances:

Explicitly saying that the rate-limit headers don't give you enough information to calculate the "time to quota" for the case where N > 1 and when there are more than one time windows. FWIW, I don't think it's unreasonable to have such a stance; I'm guessing most clients only care about the N=1 case. Calling out what we aren't doing can help make things clearer in readers' minds.
Letting servers provide Remaining with a syntax similar to that of Limit, where servers can optionally tell the client how much quota they've eaten out of each quota-policy. I know this would be a novelty, but as far as I can tell, so is the w= syntax in Limit.

Or maybe I'm missing something here? Is there already language that implies stance (1)? I may simply not be reading carefully enough.

ioggstream commented 3 years ago

@unleashed I think you're the expert here...

unleashed commented 3 years ago

for all other cases the clients needs to know not just its Remaining for the quota-policy with the least remaining quota, but also its Remaining for all other quota policies.

This is something the service you consume should be able to convey via the quota policy comments. We decided that this is a complex scenario that is better left for services to specify however they see fit via the comments, and for clients to interpret with that service-specific knowledge. It does not even need to happen that way - for example, a service could compute what you need and apply a specific quota policy for the remaining field if you perform a particular request - there is no requirement that the headers respond with related contents (ie. same policy, window, coherent remaining units) to two instances of even the same request.

ucarion commented 3 years ago

This is something the service you consume should be able to convey via the quota policy comments. We decided that this is a complex scenario that is better left for services to specify however they see fit via the comments, and for clients to interpret with that service-specific knowledge.

That's a fair stance. I suppose I have two questions, then:

Why is w privileged at all, then? What is a client supposed to do with the w parameter on its own?
Are you open to a PR that would make this explicit? Explicitly calling out that giving a per-policy "remaining" is something to be expressed via a comment? It's obvious that you could do that, but it's not obvious that that's the way you're "supposed" to do it.

unleashed commented 3 years ago

[edit: answered the second point]

Why is w privileged at all, then? What is a client supposed to do with the w parameter on its own?

w is informational and helps generic user agents and intermediaries who have no service specific knowledge cover the main use case that motivates these headers, which is simple, instantaneous rate limiting of service resources (in addition to standardising the headers around that use case, which have proliferated in the wild with different semantics).

This information is useful because a client can figure out what is the current expected rate of consumption regardless of the remaining time and remaining quota units in the window, which can help avoid spikes of requests even if those would still be served according to the headers.

For example, having 10000 remaining units with 10 seconds to go for a reset (ie. think about absolute windows) would convey a very different average rate of consumption than the same information with a 10000;w=1000, which is especially useful if your client does not really need to consume those 10000 units before the reset happens. In this case if a client tries to consume a portion of the quota in the remaining time, say 1000 units, with a rate that isn't sustainable for the service at 100 units per second, the service might choose to return more restrictive headers in subsequent responses, even to the point the apparent rate becomes lower than the extra informational data would suggest.

That is, after 100 requests during the first second the service might suddenly return a remaining quota of 0 for the remaining 9 seconds just to try to protect itself and recover from the spike in requests. A client with the extra knowledge of the time window can approximate a rate of 10 units/sec and decide whether to push for a higher rate if needed or just stick to that rate if it does not really need to consume the 1000 units before the reset.

So the answer is that it provides useful information for efficiency in the main use case, and it is usually found in pre-existing implementations, so you might think it warrants this special consideration. Smarter clients with service specific knowledge that can work with more complex policies will always be more efficient, but having implementers of generic user agents and intermediaries know about this rate limiting concept can have benefits across the board.

Are you open to a PR that would make this explicit? Explicitly calling out that giving a per-policy "remaining" is something to be expressed via a comment? It's obvious that you could do that, but it's not obvious that that's the way you're "supposed" to do it.

I for one welcome all contributions. Please open PRs for anything you'd think useful and we will discuss there the specifics. Thanks!

ucarion commented 3 years ago

Let me try to rephrase your example, to make sure I understand you. I would have thought that w is "just" an enhancement to Reset, which tells you what Reset will be after it hits zero, so clients can know the long-term rate of traffic they're entitled to. It sounds like my understanding of what w is meant to convey is different from what you're saying.

If a client saw these headers:

RateLimit-Limit: 10000
RateLimit-Remaining: 10000
RateLimit-Reset: 10

Then the client could probably assume this means it can send 10k requests in the next 10 seconds. Which works out to a rate of 1000 requests per second.

But if instead the document includes w and the server sent:

RateLimit-Limit: 10000, 10000;w=1000
RateLimit-Remaining: 10000
RateLimit-Reset: 10

Then the client could, as you say, "figure out what is the current expected rate of consumption"? By which you mean it could infer that it should instead be sending its traffic at a rate of 10 requests per second (10,000 requests divided by 1000 seconds)?

And if the client decides to go above this 10 requests per second rate, for instance sending requests at 100 requests per second, then the server may cut the client off early, for instance by prematurely setting Remaining to 0, so that, as you say, "the apparent rate becomes lower than the extra informational data would suggest"?

If I got all of that right, what I don't understand is why is the server reporting 10000;w=1000 when it could be reporting 100;w=10, or 10;w=1? Is the common case not for clients to assume that when a server says that a client has N seconds to consume M quota, clients can assume the server really means it, because the server chooses N and M so that it can usually keep its promises? Most Limit/Remaining/Reset middlewares implement exactly what they advertise to the client.

ioggstream commented 3 years ago

@ucarion the RateLimit fields do not necessarily convey the allowed distribution function of the requests.

The following case seems to describe a service where the client didn't consume its quota in the case where it's capable of serving the given amount of requests in the interval below.

RateLimit-Limit: 10000, 10000;w=1000
RateLimit-Remaining: 10000
RateLimit-Reset: 10

If the service cannot manage the given throughput, it is free to implement a mechanism where Remaining/Reset is capped: I think the problem posed by @ucarion must be mentioned in the implementation notes @unleashed
Created #23

unleashed commented 3 years ago

If I got all of that right

Yes, you did.

@ucarion: I don't understand is why is the server reporting 10000;w=1000 when it could be reporting 100;w=10, or 10;w=1?

The main reason is the point made by @ioggstream. Many factors could affect the effective distribution function, and the usual case (but we can come up with others, just think about scaling infrastructure up/down) is just what he described: someone shows up at the very late stage of a natural period and goes crazy with the inferred rate.

must be mentioned in the implementation notes

Let's keep this issue open as a reminder until we fix the text?

ioggstream commented 3 years ago

Me and @unleashed had a recent mail exchange with @ucarion which is worth reporting on this thread. @ucarion please correct if I misinterpreded your words.

@ucarion > Q1: the intention of the quota policies in Limit is more or less limited to being documentative, are you open to changing your views on this?

A1: not for now:

the importat is to ease adoption and simple and make migration from non-standard headers straightforward
standardizing a more complex structure breaks compatibility with existing implementations, which is a major goal for wide adoption, we expect Limit to be documentative;
this enables services to providing more details with comments for specific clients to interpret while providing the basic information about what to expect in a standardized way. This allows generic clients and intermediaries to make useful decisions based on it and existing use cases can address their immediate needs

@ucarion > Q2: Limit conveys quota policies with structured information, why Remaining and Reset don't?

A2: There are many more information that could be supplied (eg. the request scope, further criteria, ..): this path will quickly become very complex, and it's not obvious whether there exists a consensus on what specific information should be included and what bits should be left out in a standard.

@ucarion > Q3: Limit conveys multiple time windows, why Remaining and Reset don't?

A3: The current model allows the client to made short-term decisions parsing just two integers. This model is backward-compatile with pre-existing headers. We added quota-policy to Limit because its value is not used to compute short-term decisions and can safely be ignored by pre-existing implementation ;)

While it's only documentative, returning Limit covers at least the use cases where servers are not capable to track Remaining. In these cases they only share fixed values of Limit and Reset to convey the average request rate they tolerate.

ioggstream commented 2 years ago

@unleashed my opinion is that we can close this issue, eventually adding an FAQ. WDYT?

unleashed commented 2 years ago

Agreed, but maybe @ucarion as the issue author might want to check?

ioggstream commented 2 years ago

@ucarion PTAL can we close? I'll close in 7 days in case of no-reply :P

darrelmiller commented 2 years ago

@ioggstream 7 days are up :-)

ietf-wg-httpapi / ratelimit-headers

Time-windows for Remaining? #19