matrix-org / matrix-spec

The Matrix protocol specification
Apache License 2.0
188 stars 94 forks source link

Domain preparation (avoid "A-Label" in the domain names). #439

Open ma1uta opened 5 years ago

ma1uta commented 5 years ago

According the IDNA (https://tools.ietf.org/html/rfc5890) there are a several ways to represent the same domain if we use non-ASCII symbols. For example, the domain öbb.at can be represent by öbb.at or xn--bb-eka.at. The first form is called U-label and uses unicode symbols. The second form is called A-label and uses only ASCII-symbols.

To convert from an one form to the another uses the Punycode algorithm (https://tools.ietf.org/html/rfc3492, https://tools.ietf.org/html/rfc3490#section-4, https://www.unicode.org/reports/tr46/#Compatibility_Processing and online tool https://www.punycoder.com/).

In the specification (https://matrix.org/docs/spec/appendices.html) it is only says that DNS names for use with Matrix should follow the conventional restrictions for internet hostnames: they should consist of a series of labels separated by ., where each label consists of the alphanumeric characters or hyphens. So we can have hostnames with non-ASCII symbols.

Should we have a domain preparation step to avoid using the A-label like xn--bb-eka.at and always use only U-label like öbb.at?

richvdh commented 5 years ago

The spec you refer to is explicit that non-ASCII symbols are not permitted in matrix identifiers:

dns-name    = *255dns-char
dns-char    = DIGIT / ALPHA / "-" / "."

(ALPHA is defined at https://tools.ietf.org/html/rfc5234#appendix-B.1).

So it's up to clients to de-punycode identifiers if they wish.

This should probably be clarified.

ma1uta commented 5 years ago

I see. I didn't find the defition of the ALPHA and DIGIT. And Matrix should use only ASCII symbols in the hostname part.

uniconstructor commented 5 years ago

Consider to use a multiaddr format for this (part of multiformats.io): https://github.com/multiformats/multiaddr

You can use this format to:

Main multiaddr features (from project readme):

If multiaddr support addresses for any network protocol - maybe the best solution is to add a Matrix protocol support for multiaddr and then use their standard for any future transformations.