jakartaee / rest

Jakarta RESTful Web Services
Other
354 stars 114 forks source link

Convenience method for checking header value lists #1066

Closed mkarg closed 1 year ago

mkarg commented 2 years ago

For release 4.0 I'd like to propose a convencience method for response/request contexts which I missed a lot in the past years and which I implemented again and again in multiple ways -- most of them lacking either performance or perfection.

The problem it solves is that it is rather hard to get it perfectly correct checking for items in comma-separated headers - without losing valuable performance.

Example: How to check if there is a Cache-Control: no-store value set in case the actual sent headers contains Cache-Control multiple times and each of the findings is possibly a comma-separated list? RegEx over getHeaderString() comes into mind, but this is rather slow, as it first concatenates everything just to split it up afterwards. Iterating all findings comes into mind next, which is everything but convenient nor particularly readable.

Solution: An implementation of this new method would be provided by the runtime vendor, hence could provide not only most convenient usage to the caller, but also perform best possible as it is optimized according to the actual implementation's internal map design

WDYT?

jansupol commented 2 years ago

Error: Failed to execute goal org.apache.maven.plugins:maven-javadoc-plugin:3.3.0:jar (attach-javadocs) on project jakarta.ws.rs-api: MavenReportException: Error while generating Javadoc: Error: Exit code: 1 - /home/runner/work/jaxrs-api/jaxrs-api/jaxrs-api/src/main/java/jakarta/ws/rs/container/ContainerResponseContext.java:129: error: reference not found Error: * @see #getHeaderString()

mkarg commented 2 years ago

Error: Failed to execute goal org.apache.maven.plugins:maven-javadoc-plugin:3.3.0:jar (attach-javadocs) on project jakarta.ws.rs-api: MavenReportException: Error while generating Javadoc: Error: Exit code: 1 - /home/runner/work/jaxrs-api/jaxrs-api/jaxrs-api/src/main/java/jakarta/ws/rs/container/ContainerResponseContext.java:129: error: reference not found Error: * @see #getHeaderString()

Thanks. Fixed.

mkarg commented 2 years ago

If nobody objects to adopting this small feature to JAX-RS 4.0, then I would invest some more time and add a TCK test in this PR and remove the draft state flag. Any objects? Please veto NOW so I spare the time for writing the tests. Thanks.

chkal commented 2 years ago

A few thoughts about this proposal.

First of all: Personally, I never had a situation where this method would have been helpful. However, I see use cases for it, so I'm not against adding it to the API.

One thing I'm wondering about is how to deal with case-insensitivity. For the header name, the situation is clear, as HTTP header names are case-insensitive, which should be handled correctly by the implementation. But what about the header value? My first feeling was that it should do a case-sensitive match. However, in case of Cache-Control mentioned above, directives are case-insensitive. So I'm wondering how we should deal with such cases.

arjantijms commented 2 years ago

Could Servlet not need the exact same utility?

mkarg commented 1 year ago

One thing I'm wondering about is how to deal with case-insensitivity. For the header name, the situation is clear, as HTTP header names are case-insensitive, which should be handled correctly by the implementation. But what about the header value? My first feeling was that it should do a case-sensitive match. However, in case of Cache-Control mentioned above, directives are case-insensitive. So I'm wondering how we should deal with such cases.

We could provide two methods, one case-sensitive and one case-insensitive, just like the JRE does it https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/lang/String.html#equalsIgnoreCase(java.lang.String).

mkarg commented 1 year ago

Could Servlet not need the exact same utility?

OT: Frankly spoken, I personally do not see any use of Servlets anymore at all these days. JAX-RS is superior from the user's view in most cases other than performance wrt to HTTP use. I have not used neither Servlets, nor Servlet-based JAX-RS since years (this is why I lead the Java SE Bootstrap project). If you ask me, the core idea of Servlet API is deprecated, as it was invented as a pure Java replacement for NSAPI / ISAPI, hence to enable Java code run on popular web servers like IIS or Apache (which simply tried to prevent starting native processes using (Fast) CGI). Those times are over since decades. Since most people meanwhile migrated to "real" Java based application servers (hence: products able to natively run JAX-RS), and since mainstream users are migrating to cloud / servlerss architectures, there is not much use left over for Servlets. If you ask me, Servlet API should be dropped in favor of a "Streamlet API" which just provides the interface between a JAX-RS implementation and an NIO engine like Netty or Grizzly. The HTTP part should be completely hidden in the JAX-RS engine; if Grizzly / Netty want to process HTTP, they could natively support JAX-Rs instead of Servlet API instead. Having said that: If somebody likes, he could copy my work provided here for JAX-RS, but I personally will not spend a minute into Servlet API anymore.

mkarg commented 1 year ago

One thing I'm wondering about is how to deal with case-insensitivity. For the header name, the situation is clear, as HTTP header names are case-insensitive, which should be handled correctly by the implementation. But what about the header value? My first feeling was that it should do a case-sensitive match. However, in case of Cache-Control mentioned above, directives are case-insensitive. So I'm wondering how we should deal with such cases.

We could provide two methods, one case-sensitive and one case-insensitive, just like the JRE does it https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/lang/String.html#equalsIgnoreCase(java.lang.String).

Thinking deeper about it, I do not see the need for different methods just to prevent a single "if". I modified the proposal accordingly. WDYT?

mkarg commented 1 year ago

One thing I'm wondering about is how to deal with case-insensitivity. For the header name, the situation is clear, as HTTP header names are case-insensitive, which should be handled correctly by the implementation. But what about the header value? My first feeling was that it should do a case-sensitive match. However, in case of Cache-Control mentioned above, directives are case-insensitive. So I'm wondering how we should deal with such cases.

We could provide two methods, one case-sensitive and one case-insensitive, just like the JRE does it https://docs.oracle.com/en/java/javase/18/docs/api/java.base/java/lang/String.html#equalsIgnoreCase(java.lang.String).

Thinking deeper about it, I do not see the need for different methods just to prevent a single "if". I modified the proposal accordingly. WDYT?

Or should we go with a super-flexible solution instead, like providing a predicate? One could then pass even precompiled regex, a lambda expression, or method handles like String::equalsIgnoreCase. 😃 To me, this sounds more powerful. @chkal WDYT?

boolean containsHeaderString(String name, Predicate<String> valuePredicate)
jansupol commented 1 year ago

Now, this can possibly have a default implementation, nicht wahr?

mkarg commented 1 year ago

Now, this can possibly have a default implementation, nicht wahr?

Well, yes and no. Technically yes, it certainly could, and I would be happy to sponsor one once a majority agreed upon one of the proposal made so far (case-insensitive always? customizable? separate methods? boolean flag? predicate?). But the original idea was that perfect performance can only be provided with the knowledge of the underlying map implementation: Is it a map? Is it a list of entries? Is it a string array? etc. So all vendors should implment it. Given the fact that I would be happy to provide a PR for Jersey, and that only few competitors do exist, I do not see the benefit of a default implementation, frankly spoken.

jamezp commented 1 year ago

I would think the default should be case-insensitive since in HTTP/1 header names should be case-insensitive. AIUI in HTTP/2 they must be lowercase.

mkarg commented 1 year ago

I would think the default should be case-insensitive since in HTTP/1 header names should be case-insensitive. AIUI in HTTP/2 they must be lowercase.

Please note that we are solely talking about header values here, not header names, as @chkal outlined in the original question, so whatever HTTP/1 or /2 tell us here is irrelevant.

jamezp commented 1 year ago

I would think the default should be case-insensitive since in HTTP/1 header names should be case-insensitive. AIUI in HTTP/2 they must be lowercase.

Please note that we are solely talking about header values here, not header names, as @chkal outlined in the original question, so whatever HTTP/1 or /2 tell us here is irrelevant.

My apologies. For some reason I thought we were talking about headers :)

mkarg commented 1 year ago

Please note that we are solely talking about header values here, not header names, as @chkal outlined in the original question, so whatever HTTP/1 or /2 tell us here is irrelevant.

My apologies. For some reason I thought we were talking about headers :)

Now that we clarified, what is you opinion: Shall we go with predicates, with booleans, or with separate methods?

jamezp commented 1 year ago

Let me make sure I understand this correct. The idea is that the predicate would be tested against the value(s) of a header. For example a header like:

Cache-Control: max-age=0, must-revalidate

The predicate would be tested against "max-age=0" and "must-revalidate"?

Sorry, If I'm not quite understanding.

jansupol commented 1 year ago

I meant a default implementation such as

    public boolean containsHeaderString(String name, Predicate<String> valuePredicate) {
        String property = getHeaderString(name);
        return property != null ? valuePredicate.test(property) : false;
    }

Should the Predicate handle the case sensitivity?

jamezp commented 1 year ago

I meant a default implementation such as

    public boolean containsHeaderString(String name, Predicate<String> valuePredicate) {
        String property = getHeaderString(name);
        return property != null ? valuePredicate.test(property) : false;
    }

Should the Predicate handle the case sensitivity?

+1 This is what I envisioned for the default method.

mkarg commented 1 year ago

Let me make sure I understand this correct. The idea is that the predicate would be tested against the value(s) of a header. For example a header like:

Cache-Control: max-age=0, must-revalidate

The predicate would be tested against "max-age=0" and "must-revalidate"?

Sorry, If I'm not quite understanding.

Correct. But as a header can exist multiple times, the actual algorithm would be more complex.

mkarg commented 1 year ago

I meant a default implementation such as

    public boolean containsHeaderString(String name, Predicate<String> valuePredicate) {
        String property = getHeaderString(name);
        return property != null ? valuePredicate.test(property) : false;
    }

That would be wrong. The predicate is to be tested per single value, not per comma-separated list.

Should the Predicate handle the case sensitivity?

Yes, this is the idea of the predicate. So the caller can provide an arbitrarily complex predicate, testing much more than that.

jansupol commented 1 year ago

That would be wrong. The predicate is to be tested per single value, not per comma-separated list.

Hm....then perhaps it would make sense to provide a method that actually returns a list of comma-separated values when these methods are using it?

jamezp commented 1 year ago

One option might be to use a BiPredicate,String, String> where the first argument is the key/name and the second argument is the optional value. For example in:

Cache-Control: max-age=0, must-revalidate

It would be something like predicate.test("max-age", "0") and predicate.test("mcust-revalidate" null).

jamezp commented 1 year ago

Thinking about my above suggestion though, that doesn't make the code real clean on the user side so maybe that's not a great idea.

mkarg commented 1 year ago

That would be wrong. The predicate is to be tested per single value, not per comma-separated list.

Hm....then perhaps it would make sense to provide a method that actually returns a list of comma-separated values when these methods are using it?

This is exactly what I do want to prevent. First, it implies that all implementations must concatenate the values first just to let the application tokenize it afterwards, which is squandering performance, particularly if multiple filters need to check header values, so it would run repeatedly. Second, it makes the application (or to be more precise: filter) code much more cluttered with string processing details while the filter author just wants to deal with his own business and offload most technological questions to JAX-RS.

The actual intention of this PR is that filters are totally clean to write, e. g. like this, while providing maximum performance by preventing string operations as much as possible:

if (context.containsHeaderValue("cache", "no-store"::equalsIgnoreCase))
    return;

In this example, no useless string concatenation occurs at all. The runtime can implement containsHeaderValue in the fastest possible way fitting its particular kind of collection. The filter code itself is clean to read: "If there is no-store found somewhere in some cache header, then skip this filter completely".

As I had to write lots of common-purpose filters in the past that have to run in "unkown" combinations (the filter author is not the application author / deployer), this is exactly what I needed in rather each of them: A clean solution that provides high performance and does neither repeat itself nor clutter my code.

mkarg commented 1 year ago

One option might be to use a BiPredicate,String, String> where the first argument is the key/name and the second argument is the optional value. For example in:

I do not see the benefit of pushing the name into the predicate, actually, and it makes writing typical requests more complex. For example, a mono-predicate method is perfectly simple, but try to write the same with your proposed bi-predicate method:

context.containsHeaderValue("cache", "no-store"::equalsIgnoreCase)

Also, performance of the mono-predicate method is way better as the runtime can use the name as the key in a map, while it must linearly iterate over all entries in the bi-predicate method.

Edit: @jamezp Just noticed that you do not want to push the name but the value's key. Anyways, I think we both agree that this is not the way we want to go.

mkarg commented 1 year ago

I meant a default implementation such as

    public boolean containsHeaderString(String name, Predicate<String> valuePredicate) {
        String property = getHeaderString(name);
        return property != null ? valuePredicate.test(property) : false;
    }

That would be wrong. The predicate is to be tested per single value, not per comma-separated list.

A correct implementation with some first performance optimization could be this, but again, there is much room left open to improve performance, as it still forces string concatenation just to tokenize it afterwards, and it still does not respect the actual kind of collection used by the runtime:

default public boolean containsHeaderValue(String name, Predicate<String> valuePredicate) {
    var values = getHeaderString(name);
    if (values == null)
        return false;
    var tokens = new StringTokenizer(values, ",");
    while (tokens.hasMoreTokens()) {
        var value = tokens.nextToken().strip();
        if (valuePredicate.test(value))
            return true;
    }
    return false;
}

Optimizing the code further implies prevention of the initial concatention, hence dealing with raw header objects, hence dealing with HeaderDelegate::toString -- which we simply cannot, because a default method cannot obtain a header delegate! This fact, and the length and complexity of the resulting algorithm are the drivers behind this PR!

jamezp commented 1 year ago

Given the above example I think it does make sense to keep the method abstract. Implementations are likely going to have to override this anyway to be efficient. It also allows the spec not to have to set a rule on the case of header values, which could widely differ.

jelemux commented 1 year ago
context.containsHeaderValue("cache", "no-store"::equalsIgnoreCase)

@mkarg I really like this solution and would like to express my support. It's simple, readable and it leaves the users the freedom to do what they want with it. Imho a lot more software should be written like that.

mkarg commented 1 year ago

Ok it seems we reached an agreement how the new method should look like, so I think it is time end the draft state now.

mkarg commented 1 year ago

Committers, please review and vote. Thanks. :-)

jansupol commented 1 year ago

I am not convinced about this method. It could be good for some headers, such as Cache-Control but not for others, that use the comma separator for not separating the header values. HTTP RFC, Section 4.2 says:

Multiple message-header fields with the same field-name MAY be present in a message if and only if the entire field-value for that header field is defined as a comma-separated list [i.e., #(values)].

So there is plenty of HTTP headers that use commas for other purposes than separator. All dates containing headers is such an example (with syntax Date: day-name, day month year hour:minute:second GMT). Set-Cookie, or Cookie are also the exmples. Cookie specifies value as

A list of name-value pairs in the form of \<cookie-name>=\<cookie-value>. Pairs in the list are separated by a semicolon and a space ('; ').

For the generic usage, I'd suggest a separator argument (nullable for no split) to separate the header values.

spericas commented 1 year ago

@jansupol Seems like a reasonable suggestion, and we can still use commas by default.

mkarg commented 1 year ago

For the generic usage, I'd suggest a separator argument (nullable for no split) to separate the header values.

Good idea, I will add this in the next iteration of this PR, stay tuned. :-)

mkarg commented 1 year ago

For the generic usage, I'd suggest a separator argument (nullable for no split) to separate the header values.

Good idea, I will add this in the next iteration of this PR, stay tuned. :-)

On the other hand, we are using a comma (and exactly a comma but nothing else) in #getHeaderString. Maybe we should extend that one, too with an optional concatenation character in analogy?

jansupol commented 1 year ago

Maybe we should extend that one, too with an optional concatenation character in analogy?

Sounds good.

mkarg commented 1 year ago

For the generic usage, I'd suggest a separator argument (nullable for no split) to separate the header values.

Good idea, I will add this in the next iteration of this PR, stay tuned. :-)

Implemented by https://github.com/jakartaee/rest/pull/1066/commits/b5f0a9ee051c7fd46cfdf03688095663cd6b324b.

mkarg commented 1 year ago

@jansupol Seems like a reasonable suggestion, and we can still use commas by default.

How do you envision such a default? By an additional variant of the same method having just two parameters?

mkarg commented 1 year ago

Maybe we should extend that one, too with an optional concatenation character in analogy?

Sounds good.

What would you like more, breaking backwards compatibility by adding a new argument to the existing method, or adding a second variant of the same method?

spericas commented 1 year ago

Maybe we should extend that one, too with an optional concatenation character in analogy?

Sounds good.

What would you like more, breaking backwards compatibility by adding a new argument to the existing method, or adding a second variant of the same method?

@jansupol Seems like a reasonable suggestion, and we can still use commas by default.

How do you envision such a default? By an additional variant of the same method having just two parameters?

Yes, if we think using commas is a sufficiently common case --as suggested by your original PR.

mkarg commented 1 year ago

How do you envision such a default? By an additional variant of the same method having just two parameters?

Yes, if we think using commas is a sufficiently common case --as suggested by your original PR.

Done. :-)

mkarg commented 1 year ago

@jamezp @jansupol @chkal Kindly requesting approval of this PR. :-)