Open slingamn opened 3 months ago
Summarizing some discussion from #ircv3: both Unreal and Inspircd can be configured to accept non-UTF8 nicknames and channel names. In this case, it is not possible to support text websockets, because nicknames and channel names cannot be transcoded without breaking basic IRC functionality (joining channels and sending DMs). (In contrast, in a normal mode of operation where nicknames and channel names must be UTF-8, both Unreal and Inspircd will transcode messages, realnames, and other user data to UTF8 using the Unicode replacement character.)
I think this is actually not a dealbreaker for the proposed change here, i.e. I think it's fine if those configurations just violate the "MUST support text" proposed here. Such configurations are "out of spec" in the sense that they don't correspond to any recognized value of the CASEMAPPING
parameter. (Similarly, Ergo has a configuration option to increase the maximum non-tag line length over 512, which makes the server non-spec-compliant.)
This follows up from my comment on https://github.com/ircv3/ircv3-specifications/pull/548#issuecomment-2242141592 ; I'll restate the arguments below, but if you already read that comment, you're up to date.
First of all I would like to thank everyone who contributed to, reviewed, discussed, or implemented the current draft. I think the current draft was successful in advancing the conversation and as a resource for implementers. In particular, we identified the need for a binary transport and specified a robust mechanism (subprotocols) for allowing text and binary transports to coexist.
I think it's worth revisiting one design issue in particular: I do not intend this as an expression of disrespect towards anyone who contributed to this discussion previously (I think the deliberative process thus far has been extremely valuable).
The current draft states that servers MUST support binary and SHOULD support text. Roughly three years later, we have three server implementations, all of which support both binary and text [1], and no client implementations that support binary. Given the state of play, are there still compelling reasons not to make the spec symmetrical, and require that servers MUST support both binary and text?
As I understand it, there were two blockers during the original deliberative process. First, OFTC NAK'ed this proposal; however, OFTC subsequently dropped out of the spec process to pursue their own gateway implementation.
Second, Libera NAK'ed the proposal. I may have misunderstood the reasons for this, so I invite anyone with a better understanding to correct me, but I think part of the impetus was the idea that Solanum could serve as an IRCv3-compliant implementation shared between OFTC and Solanum. As I understand it, these plans are on hold given OFTC's differing priorities. Secondly, some comments alluded to a policy rationale founded in Libera's ongoing need to support non-UTF8-based communities:
I am not fully clear on the rationale here. The proposed change here is that servers should support both binary and text, allowing the implementation of binary-based web clients with encoding selection support; it's just that they will also support text frames. Moreover, Libera currently deploys a web IRC stack (Kiwi + webircgateway) that lacks support for non-UTF8 encodings. If the existence of this option doesn't marginalize non-UTF8 communities, then arguably official support for text websocket frames shouldn't either.
In passing, there already exists a configurable reverse proxy implementation that's spec-compliant and exposes both text and binary frames. I believe that any deployments concerned about the implementation burden of text frames would be able to use this proxy, or a similar one; it's stateless so it's horizontally scalable, and it's fairly CPU-efficient in most modes of operation.
Finally I should say something about the reasons for wanting to make this change. I think the original rationale advanced for text frames --- that they make it substantially simpler for clients to achieve a correct, performant implementation --- remains valid. But the current draft of the spec effectively nudges client developers towards binary frames, because binary is the guaranteed interoperable baseline, even when the developer has no intention of supporting non-UTF8 encodings. I think this is suboptimal.
Thanks for your time.
[1] There are some non-default configurations of Unreal where it supports only binary, but I don't think this is a significant concern.