matrix-org / matrix-spec

The Matrix protocol specification
Apache License 2.0
177 stars 94 forks source link

Clarify exact rules for encoding federation URLs #849

Open neilalexander opened 3 years ago

neilalexander commented 3 years ago

The spec seemingly doesn't specify exactly what the rules should be for URL-encoding federation URLs.

Synapse seems to query-encode both path and query elements. Dendrite path-encodes path elements and query-encodes query elements.

There is conflicting guidance in the below RFCs:

https://datatracker.ietf.org/doc/html/rfc3986 Uniform Resource Identifier (URI): Generic Syntax https://datatracker.ietf.org/doc/html/rfc1738 Uniform Resource Locators (URL) https://datatracker.ietf.org/doc/html/rfc1630 Universal Resource Identifiers in WWW

(In addition, the CS API refers to encodeURIComponent in one of the examples — a JavaScript function which, according to Mozilla, seems to half-implement RFC3986 but doesn't handle !'()*.)

This matters especially for federation because the request path is included in the Authorization: X-Matrix signature and it makes it very hard to deconstruct and reconstruct these URLs (e.g. in the Low Bandwidth work) without breaking the signatures.

clokep commented 3 years ago

I think this is potentially a duplicate of matrix-org/matrix-spec#561?

KitsuneRal commented 3 years ago

FWIW, RFC 1738 is marked as updated by 3986; and RFC 1630 is a memo, not a standard, saying in the very beginning that "An Internet standard for general Resource Identifiers is under development within the IETF". With that said, there's WHATWG spec for URLs, which does divert from RFC 3986 in some cases (mainly when it comes to web browsers convenience, it seems). From my looking around when designing Matrix URIs, RFC3986 (and also RFC3987 for IRIs) seemed to be the "best" (authoritative and fixed) source for non-web needs. Still it's down to us to tie the remaining loose ends down - RFC 3986 leaves it to applications whether or not to encode certain sub-delimiters (such as & or =), e.g.