softwaremill / sttp-model

Simple Scala HTTP model
https://softwaremill.com/open-source
Apache License 2.0
43 stars 28 forks source link

Browser compatibility rendering support for Content-Disposition header #317

Closed sanathpagero closed 7 months ago

sanathpagero commented 11 months ago

Hi,

I have noticed currently Part model strictly follows the RFC2183 for Content-Disposition header. In our case we have a filename with non US-ASCII characters (UTF-8) and browsers render the filename weirdly due to pre encoding in the sttp Part model.

  def contentDispositionHeaderValue: String = {
    def encode(s: String): String = new String(s.getBytes("utf-8"), "iso-8859-1")
    "form-data; " + dispositionParamsSeq.map { case (k, v) => s"""$k="${encode(v)}"""" }.mkString("; ")
  }

Is there a workaround to overcome this issue or have a plan to implement the feasibility to switch modes while sending multiparts?
apache provide a similar feature when building multiparts as STRICT,BROWSER_COMPATIBLE,RFC6532

adamw commented 10 months ago

It seems there's really no standard way to do this.

I suspect you found this method via sttp? Then there's an additional question - what does the server, to which you send the multipart, support.

My first attempt at a fix would be to use percent-encoding at least. That should be better than nothing ... do you know how to these three modes you mentioned (STRICT, BROWSER_COMPATIBLE, RFC6532) work, that is, what kind of encoding do they apply? It might be hard for us to pass the encoding mode, unless we extend the global backend configuration.

sanathpagero commented 10 months ago

Hi,

Yes, i found this when using sttp. I'm sending multiparts mime message to a third party email service provider which then deliver our message to email servers. So it can be any email server. People who reads emails using their email clients (eg: google, yahoo, ...) complains that they cannot see the attachment names in UTF-8 format (in this case it's Arabic)

I didn't follow how Apache did this in depth. But this is what i could find and hope that helps you to differentiate each case :)

As a summary when building the mime and writing the mime headers to output stream they do different things according to mime mode i mentioned.

  1. STRICT - represents a collection of MIME multipart encoded content bodies, implementing the strict (RFC 822, RFC 2045, RFC 2046 compliant) interpretation of the spec. default charset US-ASCII is used here for encoding headers.

  2. RFC6532 - represents a collection of MIME multipart encoded content bodies, implementing the strict (RFC 822, RFC 2045, RFC 2046 compliant) interpretation of the spec, with the exception of allowing UTF-8 headers, as per RFC6532. UTF-8 will be used for encoding headers.

  3. BROWSER_COMPATIBLE - represents a collection of MIME multipart encoded content bodies. This class is emulates browser compatibility. For browser-compatible, only write Content-Disposition and headers will be encoded with the charset given by the user.

sanathpagero commented 10 months ago

@adamw Do we see any possibility around this?

adamw commented 10 months ago

Before we jump into implementing this in sttp itself, maybe you can try the following work-around (I don't really have a good way of testing, unless you could suggest a scenario where I can replicate this).

The work-around would be to use encoded values as the file name (so encoding outside of sttp). So when you call part.fileName(...), you would give the encoded value.

From the information you have provided, it seems that RFC6532 encoding should work. That is, quoted-printable. There are no built-in codecs in the JDK for this, but this seems promising: https://commons.apache.org/proper/commons-codec/apidocs/org/apache/commons/codec/net/QuotedPrintableCodec.html

Let me know if this works :)

sanathpagero commented 10 months ago

Hi Adam,

Thanks for the prompt response:) But I'm not sure if your workaround will work since even we try to pre encode it to quoted-printable these filename consumers are regular web browsers. So the browsers will show encoded quoted-printable text as it is?

In regards to the testing, We are using regular Http request inspecting websites like requestbin to inspect the body payload where we can see the filenames in parts.

adamw commented 9 months ago

@sanathpagero I think the encoding is only visible in the value of the header, so if you pass an already encoded value, chances are it will work (since the encoding only uses ascii characters). There's an escape sequence at the beginning, so browsers should understand it. Can you try at let us know if this works?

sanathpagero commented 9 months ago

@adamw I tried to test this with quoted printable encoded file name but i see the encoded name as it is in the browser (see screenshot) what i did was just passing the pre encoded filename when creating the Part entity.

Screenshot from 2024-01-22 15-06-17

sanathpagero commented 8 months ago

@adamw any thoughts?

ghik commented 8 months ago

@sanathpagero Which sttp backend and which version are you using to send your requests?

sanathpagero commented 8 months ago

I'm using HttpClientFutureBackend from httpclient-backend , version 3.3.18

ghik commented 8 months ago

Ok, I have confirmed what's exactly going on in this configuration: sttp does the mentioned encoding (reinterpreting utf-8 bytes as iso-8859-1) but the backend then encodes it again as utf-8 which results in gibberish on the network.

We can fix this by removing this encoding and making sure that all backends simply emit headers using utf-8 (e.g. the equivalent of RFC6532 mode). Would that work for you @sanathpagero ?

sanathpagero commented 8 months ago

@ghik thanks for looking in to this. yes, if you are going to do both (remove encoding of disposition header and backend utf-8 header emitting) that will work for us :)

ghik commented 7 months ago

Reopening this issue as #336 on its own is not enough to close this issue (my bad, I said to GitHub that it does and it automatically closed the issue).