Open lidel opened 1 year ago
@marten-seemann a̹̖̪͔͖̝͘̕͞r̛̛̫̞̬̝͎̘̹͞ę̴͉͉̼̀͟ ̛͏̴͜͏͉̣͉̮͖̳̝͍̭̮̮͍͔͈̲͉͙̰ͅy̞̠̮̥̯̞̦͈̮̣̗̤͚͓̻͝ǫ̶̸̻̫͍̞͎̘͡u̦̮̭̬͔̝͇̠͟͡ ̦͇̘̤͔̝̥̜̘̙̭͙̱́͢s̸̛̠̝͓͚̬̦͇͘͜u͠͏̧̪͇̙͖̳̘̘͓͞͝ͅr̶̷̛̪̩̭̮̱͖̼̙̬̣͔̫͇̳̳̦͢ͅe̸̡̧̯̝͖̮̱͖̲̜̙̹͜͝ͅ ̧̰̟̻̳̗̱͕̰̜̠͇̘͔́͜͟y̡̢͉͙͎͎̹̖̲͇̰͟͢o̶̷͍̼̙̦̫̹̯̭͓̺̞͢ù͏̡̠̭̤̗͖͈͕͙̭͢ ̡̨̛͇͚̦̠̗͢͢w͏̦̗͍̫̀á̢͔͚̰̬͔͕̼̕ņ̴̛̲̜̮͈͙̰̀͘t̶̤̜̯͙̝̺̜̠̘̼͙̞͍̖̟͓̝̤͘͞ ͜͞͏͏͕̣͍̝͍̜͉̳̙̘̥͜t̛̪͔̯̱͓͝ǫ̸̨̢̤̘̰͉̬̤͓̙̼̖̞͖͟ ̷̨̢̳̣͈̪͙̦̻͉̗̹͓̤͖͕͘͢a̢̛͈̲͉͎̲͕͘l͠҉̧̛̞̥̟̜̠̯̰͙͎̬͜ͅl̸̵͍͓͓̼̬͙͍̰̭̟̖̪͍̀o͡͏̡̜̩̗̤͚̪̟̘̥̲̘̥͕̘͎̗̬͉͘͟w̡̡͖̼̟̣͙͕̥͈̮̜̩̟͈̫͘̕͠ ̘̻͕̬͓̗̠̕͞u̸̗͎̜̗͍̝͔̘͎̙̠̭̗͕t̴̤̺̫̮̩̀́͟͞ͅf̸͙̞̞̣̦͟8͏̞̥̮̙̙̲͉̰͖͉͚̤͇͠ ?
Nice UTF-8 art! :)
I don't really see reason not to. Limiting ourselves to ASCII is so 1990s style.
Perhaps this could be phrased as: "Implementations should discard non-ASCII characters and trim the string to 64 characters, but may choose to allow UTF-8 characters if potential for UTF-8 art/mimicry is acceptable"
I definitely wouldn't want UTF-8 support to be outright gone, using UTF-8 in protocol names (maybe containing CIDs with UTF-8 encodings) could have a lot of potential use-cases. For the agent and version strings though, that's perfectly understandable
In fact, maybe better idea, what about:
"Implementations should trim the string to 64 characters. Implementations MAY allow UTF-8 characters in the string, however, these strings should be visible to users as both UTF-8 and ASCII punycode (per IETF RFC 3492) to protect against UTF-8 mimicry."
I'd say let's fully embrace UTF-8. This is 2022, and we finally have a standard encoding that's universally supported.
Building on @Winterhuman's proposal:
"Strings are UTF-8 encode. Implementations MAY trim the string to 64 characters. When made visible to users, implementations MAY output both UTF-8 and ASCII punycode (per IETF RFC 3492) to protect against UTF-8 mimicry."
This PR formalizes normalization and length limit per string field (
agentVersion
,protocolVersion
andprotocols
array). The goal is to reduce surprises and unify behavior across implementations.Kubo PR: https://github.com/ipfs/kubo/pull/9465