whatwg / html

HTML Standard
https://html.spec.whatwg.org/multipage/
Other
8.09k stars 2.66k forks source link

Escaping multipart/form-data #7575

Open michaelweiser opened 2 years ago

michaelweiser commented 2 years ago

https://html.spec.whatwg.org/multipage/form-control-infrastructure.html#multipart-form-data

For field names and filenames for file fields, the result of the encoding in the previous bullet point must be escaped by replacing any 0x0A (LF) bytes with the byte sequence %0A, 0x0D (CR) with %0D and 0x22 (") with %22. The user agent must not perform any other escapes.

Doesn't that mean that no file (or field) name can ever legitimately contain character sequences that encode to byte sequences %0A, %0D or %22? (For most if not all encodings it'll be these character sequences exactly.)

The sender will apply no escaping but the receiving end will unescape them into bytes 0x0A, 0x0D and 0x22 respectively. From there they'll not decode back into the original character sequences, i.e. the filename will end up corrupted. For example, 100%0Applejuice.pdf would become 100<linefeed>pplejuice.pdf.

Should escaping of the escape character (0x25, %) as %25 be added here?

annevk commented 2 years ago

cc @andreubotella

andreubotella commented 2 years ago

Chromium and WebKit have both had this behavior for many years, and they don't escape the percent sign. Given the relative rarity of such filenames, I suspect this hasn't been an issue in practice (although it seems like WebKit might consider this worth looking into).

@cdumez @mfreed7 @smaug----

smaug---- commented 2 years ago

Sounds like an edge case, but something which probably should and could be fixed.

mfreed7 commented 2 years ago

This was reported to Chromium a few days ago, based on a bug filed against PHP.

I do agree that the current spec (and behavior) seems a bit odd here. It feels like we should add just one more explicit escape for filenames, from % --> %25. That would allow folks to distinguish between ".txt and %22.txt.

I suspect the compat story wouldn't be terrible in making this change, but there are no guarantees that it won't break something. I suppose we could add a use counter for finding % in a filename.