vrchat-community / osc

Files and Info on using OSC to communicate with VRChat
MIT License
234 stars 4 forks source link

Support Unicode / UTF8 in Chatbox OSC by using HTML character entity references #137

Closed cyberkitsune closed 2 years ago

cyberkitsune commented 2 years ago

(This is a cross-post from my Canny Post on VRChat feedback with a similar name)

The issue Currently the in-game chatbox support entering utf-8 and some unicode (such as emojis) via the in-game keyboard. However, if you choose to feed the chatbox via OSC using /chatbox/input, you can ONLY send ascii characters, due to utf-8 and Unicode not being supported in the OSC protocol. This means you can't send many foreign-language characters or emojis, which the Chatbox does support if you use the in-game keyboard.

The suggestion As a suggestion, I propose making the Chatbox OSC component automatically translate HTML character references (see https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references) into Unicode after receiving the OSC message, but before running the character limit check and displaying the message.

This would mean if I wanted to send the string:

頼もう!

I would instead send:

頼もう!

One can do this programmatically in python3 very easily:

>>> "頼もう!".encode('ascii', 'xmlcharrefreplace').decode()
'頼もう!'

The standard also supports mixing and matching ascii and non-ascii

>>> "I 💛 foxes! 🦊".encode('ascii', 'xmlcharrefreplace').decode()   
'I 💛 foxes! 🦊'

The resulting HTML character entities themselves are guaranteed to be in all ASCII, and therefore be OSC compliant!

momo-the-monster commented 2 years ago

The chatbox can accept UTF-8 strings as of VRChat release 2022.4.1.