Closed Benjamin-L closed 2 months ago
Seems like this is actually a bug in Go's standard library. It actually used to include a leading CRLF but was changed ~13 years ago as a workaround for a bug in a different program: https://codereview.appspot.com/4635063#msg4.
This might be a problem in Ruma, the Spec wants RFC 2046 for the boundary. (Yea, the MSC linked to RFC 1341, which may had different "rules" for boundaries)
The multipart package, which Dendrite uses, implements RFC 2046.
@S7evinK Good catch! RFC 2046 seems to have similar text stating that boundaries must be preceded by a CRLF though. On page 19:
The boundary delimiter MUST occur at the beginning of a line, i.e., following a CRLF, and the initial CRLF is considered to be attached to the boundary delimiter line rather than part of the preceding part. The boundary may be followed by zero or more characters of linear whitespace. It is then terminated by either another CRLF and the header fields for the next part, or by two CRLFs, in which case there are no header fields for the next part. If no Content-Type field is present it is assumed to be "message/rfc822" in a "multipart/digest" and "text/plain" otherwise.
NOTE: The CRLF preceding the boundary delimiter line is conceptually attached to the boundary so that it is possible to have a part that does not end with a CRLF (line break). Body parts that must be considered to end with line breaks, therefore, must have two CRLFs preceding the boundary delimiter line, the first of which is part of the preceding body part, and the second of which is part of the encapsulation boundary.
I haven't read the whole thing in detail, it's definitely possible that I'm missing something. It would be at least somewhat surprising this was a bug sitting in the go standard library for 13 years without anybody else noticing it.
Coincidentally I read Go's mime/multipart
source recently, and noticed that their parser specifically allows the CRLF to be missing at the very start of the response source. It's unclear to me though if this is leniency similar to allowing LF without CR as a terminator, or something that implementations MUST accept
on page 22 of RFC2046 the multipart body is specified as:
dash-boundary := "--" boundary
; boundary taken from the value of
; boundary parameter of the
; Content-Type field.
multipart-body := [preamble CRLF]
dash-boundary transport-padding CRLF
body-part *encapsulation
close-delimiter transport-padding
[CRLF epilogue]
Which seems to me like the leading CRLF is only required when a preamble is present, which is optional
Oh fun. I agree with your reading that the BNF clearly states the first CRLF is not required unless there is a preamble. That seems to conflict with the earlier text in the RFC to me, but either way the next step is to change ruma to be more permissive.
Background information
0.13.7+7a4ef24
go version
: unknownb4fecbc51719a33d09be1e76d55ae0eec11fb71a
(over federation API)Description
The multipart spec states that every encapsulation boundary in the body should be preceded by a CRLF. Instead, dendrite's response to the federation auth media download endpoint has the
--
of the first boundary starting directly at the beginning of the body, with no CRLF.Ruma expects a preceding CRLF here when parsing the response. As a result, homeserver implementations using ruma are unable to fetch media from dendrite servers over the authed media endpoints. This affects grapevine, and likely also affects conduwuit although I have not tested it.
relevant text from RFC 1341
> Note that the encapsulation boundary must occur at the beginning of a line, i.e., following a CRLF, and that that initial CRLF is considered to be part of the encapsulation boundary rather than part of the preceding part. The boundary must be followed immediately either by another CRLF and the header fields for the next part, or by two CRLFs, in which case there are no header fields for the next part (and it is therefore assumed to be of Content-Type text/plain). > NOTE: The CRLF preceding the encapsulation line is considered part of the boundary so that it is possible to have a part that does not end with a CRLF (line break). Body parts that must be considered to end with line breaks, therefore, should have two CRLFs preceding the encapsulation line, the first of which is part of the preceding body part, and the second of which is part of the encapsulation boundary. > The requirement that the encapsulation boundary begins with a CRLF implies that the body of a multipart entity must itself begin with a CRLF before the first encapsulation line -- that is, if the "preamble" area is not used, the entity headers must be followed by TWO CRLFs. This is indeed how such entities should be composed. A tolerant mail reading program, however, may interpret a body of type multipart that begins with an encapsulation line NOT initiated by a CRLF as also being an encapsulation boundary, but a compliant mail sending program must not generate such entities.Steps to reproduce
/_matrix/federation/v1/media/download/:mediaid
request to a dendrite server