korlibs-archive / korio

Korio: Kotlin cORoutines I/O : Virtual File System + Async/Sync Streams + Async TCP Client/Server + WebSockets for Multiplatform Kotlin 1.3
https://korlibs.soywiz.com/korio/
MIT License
361 stars 34 forks source link

UTF-8 converting to ByteArray issue (affects WebSocket) #99

Closed ArtRoman closed 4 years ago

ArtRoman commented 4 years ago

I'm using websocket on Android (Kotlin-MPP). My websocket server closes connect when it receives any string with emoji from stock AOSP keyboard.

wsClient?.send("{Test😀}")

Sending "{Test}" works great.

I've investigated that korio converts string to UTF-8 in different way: it reads emoji as two 3-byte sequences, and other implementations converts it to two 2-bytes sequences.

IDE copy-pasted this string in more compatible way, there are two symbols are visible. But encoding and decoding results are the same.

JVM repro:

import com.soywiz.korio.lang.toByteArray as korioToByteArray
import com.soywiz.korio.lang.toString as korioToString
import io.ktor.utils.io.core.String as ktorToString
import io.ktor.utils.io.core.toByteArray as ktorToByteArray
import kotlin.text.toByteArray as kotlinToByteArray

    private fun testEncoding() {
        val text = "{Test\uD83D\uDE00}"
        Log.v("text: $text")

        val kotlinBytes = text.kotlinToByteArray(Charsets.UTF_8)
        val korioBytes = text.korioToByteArray(UTF8)
        val ktorBytes = text.ktorToByteArray(Charsets.UTF_8)

        Log.v("kotlin bytes: ${kotlinBytes.toList()}")
        Log.v("korio bytes: ${korioBytes.toList()}")
        Log.v("ktor bytes: ${ktorBytes.toList()}")

        Log.v("kotlinBytes to String: ${String(kotlinBytes)}")
        Log.v("korioBytes to String: ${String(korioBytes)}")
        Log.v("korioBytes to korioString: ${korioBytes.korioToString(UTF8)}")
        Log.v("korioBytes to ktorString: ${ktorToString(ktorBytes)}")
    }

outputs

text: {Test😀}
kotlin bytes: [123, 84, 101, 115, 116, -16, -97, -104, -128, 125]
korio bytes: [123, 84, 101, 115, 116, -19, -96, -67, -19, -72, -128, 125]
ktor bytes: [123, 84, 101, 115, 116, -16, -97, -104, -128, 125]
kotlinBytes to String: {Test😀}
korioBytes to String: {Test������}
korioBytes to korioString: {Test😀}
korioBytes to ktorString: {Test😀}
Method Result
com.soywiz.korio.lang.toByteArray(UTF8) [123, 84, 101, 115, 116, -19, -96, -67, -19, -72, -128, 125]
kotlin.text.toByteArray(Charsets.UTF_8) [123, 84, 101, 115, 116, -16, -97, -104, -128, 125]
io.ktor.utils.io.core.toByteArray(Charsets.UTF_8) [123, 84, 101, 115, 116, -16, -97, -104, -128, 125]

My current workaround is to convert string to ByteArray using ktor (kotlin convertor doen't exists outside JVM) and send it as binary message.