Closed roblperry closed 1 year ago
If your boundary contains whitespace, it needs to be properly quoted (like any other parameter), otherwise the parser has no idea whether the next token is part of the previous parameter or a new parameter.
https://github.com/roblperry/busboy/commit/956eb8dd7e005994c56e2b741253ea4d4063a195 seems to work for me, but I am not that terribly familiar with the standard. So, I recognize that it would be easy for me to be missing important side effects of my changes.
For that matter, I now see that I missed this part: "If a boundary appears to end with white space, the white space must be presumed to have been added by a gateway, and should be deleted." But, I should be able to solve that if this general direction is agreeable.
Whitespace after an unquoted parameter value should be discarded. Whitespace at the beginning or in the middle of an unquoted parameter value would be a syntax error.
The easiest solution to avoid having to worry about any of these problems is to just always use quoted values.
I agree that the client library in question would be improved by modifying it to avoid placing a space in the boundary. I can't imagine the benefit of having said space. I also don't want to take up too much of your time. However, I would very much like to point the maintainers of the client library at a standard backing my case, otherwise I am just going to be stuck stating that not all server side implementations support spaces inside the boundary, which isn't a terrible argument. I'd just rather have something stronger.
Clearly, I've looked. Am I looking at the wrong spec? Do you have anything for me on that front?
Either way, thank you for busboy. I rather enjoyed crawling around in the performance optimized parser. I don't think I've played with that style of code since the 90s.
It's an HTTP header, so you would be referencing the ABNF from RFC 7231 (and RFC 7230 for some reused, general types), starting with the Content-Type
token.
Excellent. Thank you, sir.
For anyone else who happens across this RFC 7231 lead me to:
parameter = token "=" ( token / quoted-string )
where
token = <token, see [RFC7230], Section 3.2.6>
which aligns with what mscdex has been trying to tell me.
token = 1*tchar
tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*"
/ "+" / "-" / "." / "^" / "_" / "`" / "|" / "~"
/ DIGIT / ALPHA
Sending a boundary containing whitespace (e.g.
Content-Type,multipart/form-data;boundary=HyperLog -1663179571660
) will cause Busboy to error.I manually confirmed this by calling parseContentType directly.
resulting in
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Type says:
https://www.w3.org/Protocols/rfc1341/7_2_Multipart.html says:
Thus I feel that supporting spaces in boundaries would be an improvement.