zephyr-im / zephyr

An institutional/enterprise-scale distributed real-time messaging and notification system
34 stars 11 forks source link

Resolve z_charset confusion and byte-swapping issue. #127

Open davidben opened 10 years ago

davidben commented 10 years ago

This is probably a better venue for discussion than -c davidben. Alright, so a while ago there was some ramblings on -c zephyr-dev about the z_charset field and how messed up it is. From memory, here's a summary of the situation:

In addition, I discovered a new issue today. z_charset is endian-confused over the wire! This line (and a corresponding one for formatting notices) shouldn't be there. https://github.com/zephyr-im/zephyr/blob/master/lib/ZParseNot.c#L292

So, to add to our situation list:

This is a mess. It should get resolved.

So, I'm uneasy about switching Roost back over to assuming ISO-8859-1-tagged messages are actually telling the truth because I've been burned by that before. I also think protocols should minimize variability for the sake of sanity. (And for entirely selfish reasons that I'm working on a new from-scratch implementation and don't want more test vectors in my unit tests.) Here are two proposals I think I would be happy with to start things off:

Proposal davidben-there-is-no-multics

Same as above but replace "ZCHARSET_UTF_8" in the receiver section with "ZCHARSET_UTF_8 or byteswap16(ZCHARSET_UTF_8)". Big-endian senders still follow the rule about little-endian being the correct encoding. Transition back to davidben-there-is-no-multics when all big-endian machines are updated.

When shiny new Roost finally happens, we can get data on when and how often the backwards-compatibility cases occur to guide when we can drop them.