jetty / jetty.project

Eclipse Jetty® - Web Container & Clients - supports HTTP/2, HTTP/1.1, HTTP/1.0, websocket, servlets, and more
https://eclipse.dev/jetty
Other
3.86k stars 1.91k forks source link

org.eclipse.jetty.http converts incoming content type "application/json; charset=utf-8" to uppercase charset=UTF-8 #12267

Open gjoshi86 opened 1 month ago

gjoshi86 commented 1 month ago

Jetty 9.4.50.v20221201

OpenJDK 8u292 (1.8.0_292-b10)

When client sends POST call with Content-Type "application/json; charset=utf-8", it reaches our application which uses Jetty 9.4.50.v20221201 and converts it to "application/json; charset=UTF-8" with uppercase.

I debugged the Jetty-http project and found that org.eclipse.jetty.http.HttpParser class has CACHE field. While parsing Content-Type, it uses getBest() method, to get the best match and returns charset=UTF-8 with uppercase.

I know, I am using older version of Jetty which is end of support. I just need your inputs on following queries.

  1. Need to know why it is returning the uppercase UTF-8, even if client has send with utf-8 lowercase?
  2. What are the implication of setting org.eclipse.jetty.http.HttpParser.STRICT to true which is compliance mode = LEGACY
  3. Are there any other ways, we can get the UTF-8 in same format it was sent in the request from the client?
olamy commented 1 month ago

Hi, Just to let you know Jetty 9.x is EOL see https://github.com/jetty/jetty.project/issues/7958 Jetty 10/11 is EOL as well https://github.com/jetty/jetty.project/issues/10485

Can you try to reproduce your issue with Jetty 12?

For commercial support of Jetty, see above listed issues.

gregw commented 1 month ago

I think you have answered your own question. It is a case insensitive cache of common header values. There are compliance modes that you can use to bypass the cache and keep the case.... But you should not need to add charsets should be case insensitive.

Note there are fine grained compliance mode controls, so you don't need to go all the way to fill Legacy mode.

That's about all we can say for an end of life release

joakime commented 1 month ago

Also note, that the mime-type application/json has no charset, and using a charset on it has no meaning. It is always UTF-8, 100% of the time, in all cases.

gjoshi86 commented 1 month ago

@gregw @joakime Thank you for your response! This is helpful.

I have couple of questions before I close this ticket.

  1. I just need confirmation that the CACHE implementation in org.eclipse.jetty.http.HttpParser is for performance optimization. Is that right?
  2. I have a question around "Note there are fine grained compliance mode controls, so you don't need to go all the way to fill Legacy mode." - I tried different compliance mode like RFC7230, RFC2616 etc but it works only in case of LEGACY compliance mode. I think the property (org.eclipse.jetty.http.HttpParser.STRICT = true) kicks in only in case if LEGACY compliance mode. Secondly, Is it possible to set LEGACY mode only for specific header like Content-Type?
gregw commented 1 month ago

@gjoshi86 The cache is indeed an optimization to avoid many copies of the same string being created and also to allow fast lookup of the actual semantics.

For fine grained compliance in jetty-9, you will need to use one of the CUSTOM modes configured with a system property. See the HttpCompliance class for more detail

seemasjoshi commented 1 month ago

@gregw We have similar situation where we need to support LEGACY mode only for HttpComplianceSection.CASE_INSENSITIVE_FIELD_VALUE_CACHE. How can we use the CUSTOM mode to support this? If possible, please share an example.

Also just for my understanding will you please share the reason for choosing upper case to store content types in cache instead of lower case? I have observed that most of the older APIs use lower case for Content types. Hence looking for reason, if any.

joakime commented 4 weeks ago

@seemasjoshi see https://jetty.org/docs/jetty/12/programming-guide/server/compliance.html for example.

See https://javadoc.jetty.org/jetty-12/org/eclipse/jetty/http/UriCompliance.html#from(java.lang.String) for details about the String syntax.

seemasjoshi commented 3 weeks ago

Thank you! I will try these examples.

It will be helpful if you can also share the reasoning behind the design choice of storing upper case values in cache instead of lower case. This will help us better communicate the change with our customers and ensure to align with best practices.