multiformats / multistream-select

Friendly protocol negotiation. It enables a multicodec to be negotiated between two entities.
MIT License
63 stars 11 forks source link

String representation character encoding #16

Open ntninja opened 5 years ago

ntninja commented 5 years ago

Just a short question here: What is the character encoding of the protocol strings? I realize that the spec only talks about bytes, but implementations expose it as strings for convenience and therefor need to convert the values. Apparently both the Go implementation and the JS implementation treat it as UTF-8, if I deciphered their code correctly. So is it really UTF-8 or is it ASCII, but using a UTF-8 conversion function was just more convenient? Not a biggy, but this should be specified nonetheless.

Stebalien commented 5 years ago

In this spec, it's "just bytes". However, I you're right. We need to specify that strings should be encoded as UTF-8.

ntninja commented 5 years ago

@Stebalien: What about domain names, they can contain Unicode (IDNA)?

In py-multiaddr I added support for domain names, by having be full Unicode in text form (since the strings are always Unicode), and IDNA2008/PunnyCode-encoded ASCII in binary form. Does that sound about right to you?

ntninja commented 5 years ago

Oh, sorry! I mixed up MSS with multiaddr there!

Please ignore my last comment!