ietf-wg-httpapi / ratelimit-headers

Repository for IETF WG draft ratelimit-headers
Other
45 stars 5 forks source link

Field names and combination #65

Closed mnot closed 1 year ago

mnot commented 3 years ago

RateLimit-Limit just seems... wrong. Have you considered RateLimit?

Otherwise, if you stick with a prefix, perhaps it should be shorter -- e.g., RL-*?

ioggstream commented 3 years ago

Me and @unleashed had this discussion and considered that it was important for adoption to use familiar names, so we just stripped out the X- from the most widely used ratelimit field names.

In time, these names have been implemented by Kong, 3scale and more recently by envoy-proxy too...

unleashed commented 3 years ago

IIRC the important factor on the naming was popular existing usages of this prefixed (and redundant) naming scheme - we wanted to make migrating to these fields as direct and straightforward as possible for most simple cases. OTOH, I also agree that the additional cognitive load for alternatives is probably not that much.

ioggstream commented 2 years ago

During the latest IETF-113 there was a discussion not only on field names, but even on field syntax, and an alternative proposal was made, e.g.

RateLimit: limit=10, remaining=5, reset=40  # base
RateLimit: limit=5, 
                 remaining=5, 
                 reset=5, 
                 policy=(10;w=1;burst=1000 50;w=60;comment="ciao" 1000;w=3600 5000;w=86400)

mapping to this structure

limit: 10
remaining: 5
reset: 5
policy:
- [ 10, { w: 1, burst: 1000 } ]
- [ 50, { w: 60, comment: ciao} ]
- [ 1000, {w: 3600 }]
- [ 5000, {w: 86400}]
mnot commented 2 years ago

There are other ways to spell this. Considering the impact of header compression, it may be better to do something like:


RateLimit: limit=10, remaining=5, reset=40
RateLimit-Policy: 10; w=1; burst=1000,
     50; w=60; comment="ciao"
     1000; w=3600
     5000; w=86400
ioggstream commented 2 years ago

Thanks Mark. I think we can for now merge Policy as a separate field. I will continue the experiments.

unleashed commented 2 years ago

Having a separate policy header looks good to me.

While we are at it, @mnot's blog post mentions trying to avoid varying contents in a header to benefit header compression, and in this case the limit is very likely to stay the same for long compared to remaining and reset, so it seems to me the case could be made that we might want to have the limit in a separate header (maybe the policy header itself?) and the two continuously varying pieces of information in a different one.

Since we will be changing quite a bit here, we could also review which fields would be mandatory if we reorganized them this way.

ioggstream commented 2 years ago

@unleashed conflating Limit and Policy has the following drawbacks:

  1. policy is informative, limit is actionable
  2. policy is optional, limit is required: in #25 the majority of providers only uses limit, required, reset
  3. there's no guarantee that limit will match one of the policies (e.g. when infrastructures implement dynamic limits)
unleashed commented 2 years ago

@ioggstream right, I opened up this possibility for consideration in light of becoming friendlier to header compression, which would likely change which fields are required. Indeed having limit required but policy optional would not work, so the question would be whether we'd consider either making the former optional or the latter required. If that's out of the table, then I support going forward with #81 as is.

ioggstream commented 2 years ago

@unleashed Just to stress further the topic before merging #81 we could replace RateLimit-Policy with RateLimit-Info capable of supporting further dictionary member-keys parameters together with quota-policy parameters, but I don't know if there's some value in that.

RateLimit-Info: \
  policy=(10;w=1;burst=1000
               50;w=60;comment="ciao" 
           1000;w=3600 
          5000;w=86400),
  amz-scope-url=/my/path,
  amz-user=8b5a7e6d
unleashed commented 2 years ago

To me this option looks slightly more complex due to the dictionary but the benefit would be that this information is likely going to stay identical for some time, with less variance than the other pieces of information, as presumably, given similar requests, changing policy parameters dynamically would be done only under specific, hopefully not too frequent conditions.

This however is based on the assumption that we want to try and benefit header compression in a scenario in which we could potentially end up with lots of policies, especially considering intermediaries and HTTP/2 and QUIC.

@mnot I think we would benefit from your experience in this space wrt this discussion.

ioggstream commented 2 years ago

changing policy parameters dynamically would be done only under specific, hopefully not too frequent conditions

yes

based on the assumption that we want to try and benefit header compression

The greatest benefit will derive from avoiding returning X-RateLimit-Minute, X-RateLimit-Hour, X-RateLimit-Day and instead just returning the lowest expiring limit :)

dret commented 2 years ago

so it seems this issue now proposed RateLimit and RateLimit-Policy? that sounds like a good resolution to me, it keeps the "branding" of what's out there already.

darrelmiller commented 2 years ago

@ioggstream What is your current thinking around merging the headers as @mnot suggested? e.g.

RateLimit: limit=10, remaining=5, reset=40

This seems like something that should be addressed before WGLC.

ioggstream commented 2 years ago

Hi @darrelmiller

Me and @unleashed discussed the topic and we tried: we spent time on this trying to accommodate this request, and re-checked the landscape.

Our findings are that merging the information in a single field has clear drawbacks in terms of adoption and interoperability, and weak or unclear operational benefits.

As explained in the email:

mnot commented 2 years ago

It's good that you've surveyed existing folks, but what would be very helpful to know is why they decided to use different fields -- is there a significant reason, or did they just do it that way arbitrarily?

I'm not surprised that you don't find any software supporting a particular pattern that they don't need to yet. What we should look at is whether there's a significant barrier to them doing so in the future (i.e., adoption).

Given that they'll have to write parsers for this no matter how it's done, and that they'll have to creates tests, etc., I don't see how there's a material difference between using two header fields and one (especially since there are off-the-shelf libraries for SF in many languages now).

Again, I'm not stuck on spelling this out in a particular way, but I do want to see the reasoning for why we choose it. Following what others do without understanding why it was done is not good practice.

dret commented 2 years ago

On 2022-07-18 03:53, Mark Nottingham wrote:

Given that they'll have to write parsers for this no matter how it's done, and that they'll have to creates tests, etc., I don't see how there's a material difference between using two header fields and one (especially since there are off-the-shelf libraries for SF in many languages now).

the point is that many applications in the API space are not writing code from the ground up where they can freely use libraries. they are writing code (or simply configure things) in the context of their API management tooling. in these environments, writing a parser is impractical or impossible. processing fields that can be interpreted with regexes on the other hand works well in most of these environments.

Again, I'm not stuck on spelling this out in a particular way, but I do want to see the reasoning for why we choose it. Following what others do without understanding why it was done is not good practice.

we've had this exact discussion for various header fields now. what would be the most effective way to capture the arguments so that we don't have to repeat it in every case?

mnot commented 2 years ago

Erik, it'd be very helpful if you explain why it's the case that they can't use off-the-shelf libraries, but can write regex.

If we want to avoid going in circles, actual reasons rather than just "we see it being done this way" would be immensely helpful. The Web is a big place, we can't observe the whole thing. Also, it changes -- often, in response to things being standardised. Both are really important things to internalise when writing specifications.

dret commented 2 years ago

On 2022-07-18 09:48, Mark Nottingham wrote:

Erik, it'd be very helpful if you explain why it's the case that they can't use off-the-shelf libraries, but can write regex.

because regex processing is part of the platform that they use. think of API management today as a lot of COTS tooling that has certain capabilities built into it (regex is something you'll frequently find) but isn't programmatically extensible, at least not easily and for most users.

i am not quite sure what else to say here. it certainly looks different for parties building their tooling from the ground up (like browsers, CMS servers, or CDN intermediaries), but that's not how the API space works in many cases.

If we want to avoid going in circles, actual reasons rather than just "we see it being done this way" would be immensely helpful. The Web is a big place, we can't observe the whole thing. Also, it changes -- often, in response to things being standardised. Both are really important things to internalise when writing specifications.

it just seems that the API web is different from the browser web in some interesting and relevant ways. of course API tooling will evolve but as we all know, enterprise tooling tends to move at a different pace than some other parts of IT.

darrelmiller commented 2 years ago

@dret I think it is worth separating the conversation about one vs multiple headers and the SF headers conversation. Obviously, support for SF in existing infrastructure is going to take time and the reason why SF headers were not used before is clear.

However, the question remains, why was a decision made to have separate headers for x-ratelimit-remaining, x-ratelimit-reset and x-ratelimit-limit. My guess is that separate headers was chosen because it was the easiest solution. It would avoid having to define ABNF syntax for the parameters and consumers would not have to have to write any special parsing code.

The landscape is different today. SF headers does provide a standard approach to dealing with parameter values. It is true that it will take time for infrastructure to catch up, but in the meanwhile, is there a reason why someone couldn't use Regex to parse RateLimit parameter values?

@ioggstream My apologies, I must have missed that email.

Speaking as representative of a company that build a lot of APIs with "significant operational knowledge", I still have to fight to get us not to use things like "x-". My experience is that large organizations are focused on delivery of features, rather than the long term health of the industry. That is why efforts like IETF HTTPAPI are so important to drive the right behavior rather than be influenced by "that's what everyone else does".

It seems to me that this is a reaction the challenges of serializing dates over the wire rather than an explicit choice to have separate headers.

Historically, HTTP consumers have been provided with little assistance in libraries to parse values out of header values. This influences people's perspectives. SF is an opportunity to resolve that, however, if we continue to standardize headers that don't leverage the value of SF we are going to lose the momentum to address the issue.

Why would an API gateway convert headers into a new format that isn't currently standardized?

And we will continue to see a lack of adoption while we standardize poor alternatives because that's how it was done in the past.

Are there technical reasons why separate headers are better than a single header with parameters?

ioggstream commented 2 years ago

My experience is that large organizations are focused on delivery of features, rather than the long term health of the industry

Still, different organizations in this case aligned to similar behaviors.

efforts like IETF HTTPAPI are so important to drive the right behavior rather than be influenced by "that's what everyone else does"

If I had data in favor of swimming against the tide, I'd do it, but I haven't. While at the beginning of this conversation I favored the "3 fields" position mainly because of the adoption / lack of COTS support, I am now not even convinced that a combined field is the best design choice since I have no real world data to say that folks doing that since 10 years are "doing it wrong".

if we continue to standardize headers that don't leverage the value of SF we are going to lose the momentum to address the issue

IMHO this is true for completely new fields, or for mandatory ones. Optional fields have really zero "commercial power" to impose the use of a library to implementers implementing a consolidated practice for more than 10 years.

we will continue to see a lack of adoption while we standardize poor alternatives because that's how it was done in the past

I disagree: for example last year in Italy we recommend to agencies using SF for custom/internal fields (e.g. as a more secure alternative to encoding stuff in JSON) yet there are only a few of libraries https://github.com/search?q=rfc+8941 This means that even if you push for SF - and our is a regulatory push - the lack of COTS / FLOSS components hinders adoption.

Are there technical reasons why separate headers are better than a single header with parameter? is there a reason why someone couldn't use Regex to parse RateLimit parameter values?

but please see https://mailarchive.ietf.org/arch/msg/httpapi/si7ug4KiZ0NVA5DXxEsVxC9_1NM/

Note: do you all prefer to discuss the topic here or in ML? ;)

robertmclaws commented 2 years ago

Hey folks! While not a developer at Microsoft/Google/Adobe, etc, I have shipped dozens of APIs for governments and the private sector that power award-winning applications. Here's my $0.02:

Complex server header fields are like complex CSS classes. Every single time I want to mess with CSS, I use the individual properties to do the work, unless I absolutely have to. Why? Using background-position is easier than having to spend 10 minutes searching the web for the exact magical string syntax to get the result I'm looking for.

The same goes for rate-limiting. I know that from a specification perspective, it may seem awesome and optimal to combine all the data into a single field. But:

You folks have an opportunity to KISS here, and you should take it.

RateLimit-Interval: (seconds/minutes/hours/days) RateLimit-Max: integer RateLimit-Remaining: integer RateLimit-ResetsIn: integer, should be in the same interval increment as RateLimit-Interval

These 4 headers tell me:

It's simple, straightforward, and easy to understand. The other proposals seem to forget that servers and API clients are not the only ones consuming this data, that humans actually have to read it and understand what is going on without wasting their time every time they use it.

davidhopebc commented 2 years ago

I like RateLimit as a prefix and then the object being represented as described by @ioggstream in one of the more recent comments. Not always a fan of making things too verbose, but for this set of related properties it seems to work.

I'm on the fence about the RateLimit.remaining though just because it could either tie a service provider to a specific gateway vendor or having to manage that data for themselves.

Kong seems to be providing exactly this kind of data and as was previously mentioned, so are several other providers.

peppelinux commented 2 years ago

Hey folks! While not a developer at ...

Yes! Three specialized headers are a simple and clean solution for both humans and robots, these latter don't have to split, strip and evaluate the map but Simply look for an arbitrary http header with its well defined typed value.

darrelmiller commented 2 years ago

I understand the desire for simplicity, but there is a lot of value in moving to a more standardized way of expressing headers in an efficient way. There is a lot of talk about the pain of parsing multi-value headers and this is exactly what Structured Field Values are designed to solve. Creating a new HTTP header for every scalar value is not a long term scalable solution.

Allow me to present something concrete to evaluate how much complexity we are really talking about.

Taking the specification with its current constraints, and data elements as they exist today, but refactoring to use structured headers could result in a few different structures:

Using a sfDictionary it would look like:

RateLimit: limit=10, remaining=5, reset=40

In structured fields, a dictionary is a comma delimited list of sfItem. Each sfItem can have parameters, but for this header additional parameters are not needed. A simpler solution might be just to use a single integer item with some parameters.

Using a sfInteger it could look like:

RateLimit: 10; remaining=5; reset=40

This would make the RateLimit header similar to the RateLimit-Limit header but with some additional parameters.

Parsing the sfDictionary would look something like:

#r "nuget: StructuredFieldValues, 0.5.2"
using StructuredFieldValues;
using System.Collections.Generic;

SfvParser.ParseDictionary("limit=10,remaining=5,reset=40",out IReadOnlyDictionary<string,ParsedItem> rateLimits);

Console.WriteLine(rateLimits["limit"].Value);
Console.WriteLine(rateLimits["remaining"].Value);
Console.WriteLine(rateLimits["reset"].Value);

Parsing the sfInteger would look something like:

#r "nuget: StructuredFieldValues, 0.5.2"
using StructuredFieldValues;

SfvParser.ParseItem("10;remaining=5;reset=40",out ParsedItem rateLimits);

Console.WriteLine(rateLimits.Value);
Console.WriteLine(rateLimits.Parameters["remaining"]);
Console.WriteLine(rateLimits.Parameters["reset"]);

For scenarios where it is not possible to take a dependency on structured field library implementations, it is possible to do an interim solution using Regex. Here is an example,

using System.Text.RegularExpressions;

var regex = new Regex(@"(\d+)(;(\w+)=(\d+))*");
var matches = regex.Match("10;remaining=40;reset=5");

Console.WriteLine(matches.Groups[1].ToString());
Console.WriteLine(matches.Groups[4].Captures[0].ToString());
Console.WriteLine(matches.Groups[4].Captures[1].ToString());

For gateway providers to translate the old header into new headers using the sfInteger format it would require code that looks something like:

var headerMap = new Dictionary<string,string>() {
    {"x-ratelimit-limit", "10"},
    {"x-ratelimit-remaining", "5"},
    {"x-ratelimit-reset", "40"}
};

if (headerMap.TryGetValue("x-ratelimit-limit", out var limit))
{
    var RateLimit = new StringBuilder();
    RateLimit.Append("rate-limit: ");
    RateLimit.Append(limit);
    if (headerMap.TryGetValue("x-ratelimit-remaining",out var remaining)) RateLimit.Append(";remaining=" + remaining);
    if (headerMap.TryGetValue("x-ratelimit-reset",out var reset)) RateLimit.Append(";reset=" + reset);

    Console.WriteLine(RateLimit.ToString());
}

It is important to remember that acting on these header values to enable a client to back off requests before hitting rate limits is not trivial client code. Parsing a multi-value header should not be a significant roadblock. Gateways that want to translate these header to RFC compatible header do not need to take a dependency on a Structured Field parsing library because they are creating the headers themselves. They just need to do some simple string manipulation.

ioggstream commented 2 years ago

For gateway providers to translate the old header into new headers

The issue for gateway users is that to modify a single field they need at least to do both:

  1. parse a SF using regexp (with all the regexp issues)
  2. iterate all keys, whether you need to modify them or not
using System.Text.RegularExpressions;

var field_name = "RateLimit";
var field_value = "limit=10;remaining=40;reset=5";
var regex = new Regex(@"(\d+)(;(\w+)=(\d+))*");
var matches = regex.Match(field_value);
var i = 0;

var RateLimit = new StringBuilder();
RateLimit.Append("rate-limit: ");

// regexp are not a free ride imho
for (group in matches.Groups) {
  var k,v = group.captures[0], group.captures[1];
  if (k == "limit") {
    // check v is an integer
    v = int(v) * 0.75;
  }
  if (i > 0) {
    RateLimit.Append(";");
    i += 1;
  }
  RateLimit.Append( k + "=" + v);
}
Console.WriteLine(RateLimit.ToString());

while now we just do

var field_name = "RateLimit-Limit";
var field_value = "10";

// check field_value is an integer
field_value = 0.75 * int(field_value)
Console.WriteLine(field_name + ": " + field_value);
darrelmiller commented 2 years ago

@ioggstream Thanks for that scenario. I had not considered a case where an origin server returns a rate limit header and an intermediary changes that rate limit header. Under what circumstances would an intermediary want to do that?

darrelmiller commented 1 year ago

Closing this issue as I believe the outcome of topics addressed here included in PR #130