Open ma1uta opened 5 years ago
The spec you refer to is explicit that non-ASCII symbols are not permitted in matrix identifiers:
dns-name = *255dns-char
dns-char = DIGIT / ALPHA / "-" / "."
(ALPHA is defined at https://tools.ietf.org/html/rfc5234#appendix-B.1).
So it's up to clients to de-punycode identifiers if they wish.
This should probably be clarified.
I see. I didn't find the defition of the ALPHA and DIGIT. And Matrix should use only ASCII symbols in the hostname part.
Consider to use a multiaddr format for this (part of multiformats.io): https://github.com/multiformats/multiaddr
You can use this format to:
Main multiaddr features (from project readme):
If multiaddr support addresses for any network protocol - maybe the best solution is to add a Matrix protocol support for multiaddr and then use their standard for any future transformations.
According the IDNA (https://tools.ietf.org/html/rfc5890) there are a several ways to represent the same domain if we use non-ASCII symbols. For example, the domain
öbb.at
can be represent byöbb.at
orxn--bb-eka.at
. The first form is called U-label and uses unicode symbols. The second form is called A-label and uses only ASCII-symbols.To convert from an one form to the another uses the
Punycode
algorithm (https://tools.ietf.org/html/rfc3492, https://tools.ietf.org/html/rfc3490#section-4, https://www.unicode.org/reports/tr46/#Compatibility_Processing and online tool https://www.punycoder.com/).In the specification (https://matrix.org/docs/spec/appendices.html) it is only says that
DNS names for use with Matrix should follow the conventional restrictions for internet hostnames: they should consist of a series of labels separated by ., where each label consists of the alphanumeric characters or hyphens.
So we can have hostnames with non-ASCII symbols.Should we have a domain preparation step to avoid using the A-label like
xn--bb-eka.at
and always use only U-label likeöbb.at
?