matrix-org / matrix-spec

The Matrix protocol specification
Apache License 2.0
188 stars 94 forks source link

Grammar for completely opaque IDs (SPEC-388) #174

Open matrixbot opened 8 years ago

matrixbot commented 8 years ago

"Grammar" might be too strong a word, but we should probably make explicit that the following IDs are entirely implementation-specific byte sequences. The originators are allowed to create them however they like, and the recipient has to send them back as they arrived.

(Imported from https://matrix.org/jira/browse/SPEC-388)

(Reported by @richvdh)

matrixbot commented 8 years ago

Jira watchers: @richvdh

matrixbot commented 8 years ago

Links exported from Jira:

relates to SPEC-1

matrixbot commented 8 years ago

Hrm; there are encoding difficulties here.

Some of these IDs end up in JSON strings, which means that they must be interpreted as a sequence of unicode characters - they are not just byte sequences. Likewise, because our URIs are %-encoded UTF-8, having opaque byte sequences in our URIs would require part of a URI to be parsed as UTF-8, and part as 8-bit data, which most URI parsers would not be happy with.

As I see it there are two options here:

Postel's law should guide us here. My inclination is to restrict these IDs to unreserved URI characters (ie, \[A-Za-z0-9._~-]: see RFC3986) - but also to recommend that, if you receive such an ID, you parse it as a unicode string and re-encode it correctly when sending it on. This has the advantage that if you're writing a hacky bash script, you don't need to worry about escaping at all, whilst those creating IDs can still use base-64 to encode whatever they want.

-- @richvdh

matrixbot commented 8 years ago

* is used as a wildcard for device id, so must be forbidden as a device id.

-- @richvdh

richvdh commented 3 years ago

Since the links are hard to find above:

Proposals:

Other tracking issues: