ietf-wg-httpapi / ratelimit-headers

Repository for IETF WG draft ratelimit-headers
Other
42 stars 4 forks source link

Correlating limit with corresponding policy #127

Closed darrelmiller closed 10 months ago

darrelmiller commented 1 year ago

It is my understanding that a server can return multiple policies, given this example where a server has defined a new policy parameter that indicates the kind of throttling that is being enforced,

RateLimit-Policy: 100;w=10;kind=bandwidth-mb
RateLimit-Policy: 100;w=10;kind=requests
RateLimit: limit=100, remaining=2; reset=5

how does a client know if they have 2 more requests remaining or 2 more megabytes?

guzi99 commented 1 year ago

The client won't be able to tell. It is probably too complicated for clients to reason over multiple polices. In my experience, just getting one right is already hard. In my opinion, the RateLimit header is the only actionable information per request, while polices are not necessary for every request and are more informative than actionable for clients.

ioggstream commented 1 year ago

how does a client know if they have 2 more requests remaining or 2 more megabytes? The client won't be able to tell.

Exactly. The way the server exposes limits should be coherent, since according to the specs, the only actionable information in an interoperable way is the limit/remaining/reset.

while polices are not necessary for every request and are more informative than actionable for clients

Exactly. Policies were introduced to allow passing domain-specific or custom information on processing policies (e.g. product specific, ...). Instead RateLimit has the goal of providing interoperable and easily consumable information.

darrelmiller commented 1 year ago

But regardless of whether the response contains the RateLimit-Policy or not, if a server has two policies with different quota units then the RateLimit response header is not actionable without knowing which policy is being applied.

How can a client know what the quota unit of "remaining" is? Either the spec needs to say that quota unit is always requests or the quota unit needs to be communicated in some way with the RateLimit header. Otherwise the RateLimit header is useless.

ioggstream commented 1 year ago

if a server has two policies with different quota units

do you mean if the server uses different unit measures for different policies (e.g. megabytes and requests?).

garethj-msft commented 1 year ago

This really seems key to solve. Limiting on various metrics seems like a base requirement (e.g. Mb or requests)

So having the supposedly interoperable, actionable header omit that key information does appear to render it effectively non-actionable.

ioggstream commented 1 year ago

The reason we left this point out is that there is not an interoperable taxonomy of measurement units for APIs, whereas CSPs usually use (weighted) request.

Should we define an interoperable taxonomy here (e.g. Mb? MB? MiB? Uncompressed MiB? Requests? Connection count? ...)?

garethj-msft commented 1 year ago

I think as Darrel said, it's necessary to go one way or the other. Either define the header in terms of a single unit or define the taxonomy.

ioggstream commented 1 year ago

The measurement unit should probably require a separate topic, and the WG could help in addressing the topic.

bgervin commented 1 year ago

"kind" is vague. As a UoM other than requests, how actionable is it for a client to know when to retry if they don't know how much their next request is going to be?

ioggstream commented 1 year ago

Separated discussion here. https://github.com/ietf-wg-httpapi/ratelimit-headers/issues/128 @bgervin @garethj-msft @guzi99

In general, I think that supporting different measurement units in the same batch of quota policies cannot be conveyed in an interoperable way and should probably not be addressed now.

Once we resolve the issue for a single measurement unit, we could try to extend it.

My2¢, R.

darrelmiller commented 1 year ago

@ioggstream The current draft says

Two quota policies MUST NOT be associated with the same quota units value.

I see us have two potential paths forward:

I don't think it is necessary to standardize a taxonomy of units. As long as a consumer knows that it can consumer 100 gummies and it has 5 gummies left, or it can consume 1000 shizzles and it has 500 shizzles left, the actual meaning of gummy and shizzle can be communicated out of band.

Having the ability to define multiple units, does make the following constraint challenging:

The expiring-limit MUST be set to the service limit that is closest to reaching its limit

It may be difficult for a server to guess whether a client is going to run out shizzles or gummies first.

darrelmiller commented 1 year ago

"kind" is vague. As a UoM other than requests, how actionable is it for a client to know when to retry if they don't know how much their next request is going to be?

@bgervin A client would need to have an out of band understanding of what the different quota units mean. If it is bytes uploaded it will be easy, if it is "server CPU cycles" then it's going to be tricky for a client to guess.

bgervin commented 1 year ago

"kind" is vague. As a UoM other than requests, how actionable is it for a client to know when to retry if they don't know how much their next request is going to be?

@bgervin A client would need to have an out of band understanding of what the different quota units mean. If it is bytes uploaded it will be easy, if it is "server CPU cycles" then it's going to be tricky for a client to guess.

@darrelmiller Are we inviting the use of a 429 response code to be used for things other than API Rate Limiting (you sent too many requests) and extending it to "you sent too many widgets"? Instead of "kind" maybe there is just a well known set of possibilities: requests, ingress bytes, egress bytes, etc.?

darrelmiller commented 1 year ago

@bgervin I've always interpreted the 429 status code fairly liberally. The specification that defines 429 says.

Note that this specification does not define how the origin server identifies the user, nor how it counts requests.

I feel that if we constrain all requests for a particular resource to be equal from the perspective of rate limiting, then 429 becomes fairly low value.

We absolutely could define a "well known" set of kinds, but I'm not sure it is necessary at this point. The key thing is making sure the server can communicate to the developer what kind of thing they are running out of due to them issuing too many requests that consume too many of them things.

darrelmiller commented 10 months ago

Addressed by PR #130