Closed fdutton closed 1 year ago
I missed this previously, apologies for not following up.
I'm no expert in these RFCs, but my reading combined with looking at at least a Python implementation seems to suggest to me these are correct as is. Specifically, the paragraph before in the section you site says:
The following rule, consisting of six conditions, applies to labels in Bidi domain names.
and just above in Section 2 is the definition:
A "Bidi domain name" is a domain name that contains at least one RTL label.
i.e. it seems to me at least that what you're citing applies only to bidi names, not all IDN hostnames. In particular, the examples you're citing contain no RTL character, so they indeed do not need to start with a character with such a bidi property.
If you are an expert here please feel free to elaborate :)
Going to close given the above, but if you or anyone disagrees do follow up!
RFC 5890 is the top-level specification (i.e., the entry point) for describing and validating Internationalized Domain Names for Applications (IDNA). RFC 5893 is a subordinate specification that addresses how to validate domain names compliant with Unicode's bi-directional algorithm.
RFC 5893 Section 2.1 states, "The first character must be a character with Bidi property L, R, or AL." Five tests in
tests/draft2020-12/optional/format/idn-hostname
fail this check (results are the same in the other drafts).I managed to get the tests to pass by prefacing the test data with random characters from the same script. For example, I prefaced the test data for
KATAKANA MIDDLE DOT with Hiragana
withU+3045
but I do not know if this is reasonable.I can submit a pull-request but would prefer to do so once I learn how to build and test this project. I would appreciate it if someone could direct me to this portion of the documentation or describe the process. I also need to know if I should update
draft-next
.