matrix-org / dendrite

Dendrite is a second-generation Matrix homeserver written in Go!
https://matrix-org.github.io/dendrite/
Apache License 2.0
5.75k stars 675 forks source link

Federation authenticated media endpoints have invalid multipart response #3414

Closed Benjamin-L closed 2 months ago

Benjamin-L commented 2 months ago

Background information

Description

The multipart spec states that every encapsulation boundary in the body should be preceded by a CRLF. Instead, dendrite's response to the federation auth media download endpoint has the -- of the first boundary starting directly at the beginning of the body, with no CRLF.

Ruma expects a preceding CRLF here when parsing the response. As a result, homeserver implementations using ruma are unable to fetch media from dendrite servers over the authed media endpoints. This affects grapevine, and likely also affects conduwuit although I have not tested it.

relevant text from RFC 1341 > Note that the encapsulation boundary must occur at the beginning of a line, i.e., following a CRLF, and that that initial CRLF is considered to be part of the encapsulation boundary rather than part of the preceding part. The boundary must be followed immediately either by another CRLF and the header fields for the next part, or by two CRLFs, in which case there are no header fields for the next part (and it is therefore assumed to be of Content-Type text/plain). > NOTE: The CRLF preceding the encapsulation line is considered part of the boundary so that it is possible to have a part that does not end with a CRLF (line break). Body parts that must be considered to end with line breaks, therefore, should have two CRLFs preceding the encapsulation line, the first of which is part of the preceding body part, and the second of which is part of the encapsulation boundary. > The requirement that the encapsulation boundary begins with a CRLF implies that the body of a multipart entity must itself begin with a CRLF before the first encapsulation line -- that is, if the "preamble" area is not used, the entity headers must be followed by TWO CRLFs. This is indeed how such entities should be composed. A tolerant mail reading program, however, may interpret a body of type multipart that begins with an encapsulation line NOT initiated by a CRLF as also being an encapsulation boundary, but a compliant mail sending program must not generate such entities.

Steps to reproduce

CobaltCause commented 2 months ago

Seems like this is actually a bug in Go's standard library. It actually used to include a leading CRLF but was changed ~13 years ago as a workaround for a bug in a different program: https://codereview.appspot.com/4635063#msg4.

S7evinK commented 2 months ago

This might be a problem in Ruma, the Spec wants RFC 2046 for the boundary. (Yea, the MSC linked to RFC 1341, which may had different "rules" for boundaries)

The multipart package, which Dendrite uses, implements RFC 2046.

Benjamin-L commented 2 months ago

@S7evinK Good catch! RFC 2046 seems to have similar text stating that boundaries must be preceded by a CRLF though. On page 19:

The boundary delimiter MUST occur at the beginning of a line, i.e., following a CRLF, and the initial CRLF is considered to be attached to the boundary delimiter line rather than part of the preceding part. The boundary may be followed by zero or more characters of linear whitespace. It is then terminated by either another CRLF and the header fields for the next part, or by two CRLFs, in which case there are no header fields for the next part. If no Content-Type field is present it is assumed to be "message/rfc822" in a "multipart/digest" and "text/plain" otherwise.

NOTE: The CRLF preceding the boundary delimiter line is conceptually attached to the boundary so that it is possible to have a part that does not end with a CRLF (line break). Body parts that must be considered to end with line breaks, therefore, must have two CRLFs preceding the boundary delimiter line, the first of which is part of the preceding body part, and the second of which is part of the encapsulation boundary.

I haven't read the whole thing in detail, it's definitely possible that I'm missing something. It would be at least somewhat surprising this was a bug sitting in the go standard library for 13 years without anybody else noticing it.

f0x52 commented 2 months ago

Coincidentally I read Go's mime/multipart source recently, and noticed that their parser specifically allows the CRLF to be missing at the very start of the response source. It's unclear to me though if this is leniency similar to allowing LF without CR as a terminator, or something that implementations MUST accept

f0x52 commented 2 months ago

on page 22 of RFC2046 the multipart body is specified as:

 dash-boundary := "--" boundary
                  ; boundary taken from the value of
                  ; boundary parameter of the
                  ; Content-Type field.

 multipart-body := [preamble CRLF]
                   dash-boundary transport-padding CRLF
                   body-part *encapsulation
                   close-delimiter transport-padding
                   [CRLF epilogue]

Which seems to me like the leading CRLF is only required when a preamble is present, which is optional

Benjamin-L commented 2 months ago

Oh fun. I agree with your reading that the BNF clearly states the first CRLF is not required unless there is a preamble. That seems to conflict with the earlier text in the RFC to me, but either way the next step is to change ruma to be more permissive.