libsdl-org / SDL

Simple Directmedia Layer
https://libsdl.org
zlib License
9.94k stars 1.84k forks source link

Incorrect usage of GetStringUTFChars/NewStringUTF in android jni code #10278

Open edo9300 opened 3 months ago

edo9300 commented 3 months ago

In both SDL_android.c and hid.cpp, sdl uses those 2 functions to convert from and to jstring, but the way they're used is incorrect. These functions work with the Modified UTF-8 encoding, that while it's almost entirely compatible with normal UTF-8, has some differences that could make the routines reject valid UTF-8 strings. A possible proper implementation would be done by using Charset conversion, and explicitly converting strings from/to UTF-8 bytes arrays, below are possible implementations of those conversion functions that could be used in place of GetStringUTFChars/NewStringUTF.

char* ParseJavaString(JNIEnv* env, jstring jnistring) {
    if(!jnistring)
        return SDL_strdup("");
    //calls: jnistring.getBytes("UTF-8")
    const auto stringClass = env->GetObjectClass(jnistring);
    const auto getBytes = env->GetMethodID(stringClass, "getBytes", "(Ljava/lang/String;)[B");
    jstring UTF8_STRING = env->NewStringUTF("UTF-8");
    const auto stringJbytes = (jbyteArray)env->CallObjectMethod(jnistring, getBytes, UTF8_STRING);

    size_t length = (size_t)env->GetArrayLength(stringJbytes);
    jbyte* pBytes = env->GetByteArrayElements(stringJbytes, nullptr);

    char* ret = SDL_strndup((char*)pBytes, length);

    env->ReleaseByteArrayElements(stringJbytes, pBytes, JNI_ABORT);

    env->DeleteLocalRef(stringJbytes);
    env->DeleteLocalRef(stringClass);
    env->DeleteLocalRef(UTF8_STRING);
    return ret;
}
//calls: Charset.forName("UTF-8").decode(bb).toString()
jstring NewJavaString(JNIEnv* env, char* str, size_t len) {
    jobject bb = env->NewDirectByteBuffer(str, len);

    jclass cls_Charset = env->FindClass("java/nio/charset/Charset");
    jmethodID mid_Charset_forName = env->GetStaticMethodID(cls_Charset, "forName", "(Ljava/lang/String;)Ljava/nio/charset/Charset;");
    jstring UTF8_STRING = env->NewStringUTF("UTF-8");
    jobject charset = env->CallStaticObjectMethod(cls_Charset, mid_Charset_forName, UTF8_STRING);

    jmethodID mid_Charset_decode = env->GetMethodID(cls_Charset, "decode", "(Ljava/nio/ByteBuffer;)Ljava/nio/CharBuffer;");
    jobject cb = env->CallObjectMethod(charset, mid_Charset_decode, bb);

    jclass cls_CharBuffer = env->FindClass("java/nio/CharBuffer");
    jmethodID mid_CharBuffer_toString = env->GetMethodID(cls_CharBuffer, "toString", "()Ljava/lang/String;");
    jstring ret = (jstring)env->CallObjectMethod(cb, mid_CharBuffer_toString);

    env->DeleteLocalRef(cls_Charset);
    env->DeleteLocalRef(cls_CharBuffer);
    env->DeleteLocalRef(charset);
    env->DeleteLocalRef(cb);
    env->DeleteLocalRef(bb);
    env->DeleteLocalRef(UTF8_STRING);

    return ret;
}
slouken commented 3 months ago

It's probably more efficient to use our own routines to convert between modified UTF-8 and standard UTF-8. @icculus, thoughts?

slouken commented 3 weeks ago

While this is technically correct, I did an audit of the call sites of these functions and none of them would have embedded nulls or characters outside the Basic Multilingual Plane, which would encode differently from standard UTF-8.

I'm going to move this out of the 3.2.0 milestone and we can revisit this later if it becomes an issue.

edo9300 commented 3 weeks ago

While this is technically correct, I did an audit of the call sites of these functions and none of them would have embedded nulls or characters outside the Basic Multilingual Plane, which would encode differently from standard UTF-8.

I'm going to move this out of the 3.2.0 milestone and we can revisit this later if it becomes an issue.

Well, it's used in places like getting/setting the clipboard, and it wouldn't be uncommon for emojis to be present there, especially on android, in which case you'd have characters outside the BMP, but I agree it's not a very pressing issue as at worst you'll only get invalid characters. Anyways, since when I opened this issue, I found an issue in the function to create a java string posted above, so i reworked it to use a different approach to create the new string, also way more simpler to use

jstring NewJavaString(JNIEnv* env,  char* str, size_t len) {
    jbyteArray jBuff = env->NewByteArray(len);
    env->SetByteArrayRegion(jBuff, 0, len, (jbyte*)str);
    jclass cls_String = env->FindClass("java/lang/String");
    jmethodID String_new = env->GetMethodID(cls_String, "<init>", ([BLjava/lang/String;)V);
    jstring UTF8_STRING = env->NewStringUTF("UTF-8");
    // calls: String ret = new String(jBuff, "UTF-8")
    jstring ret = static_cast<jstring>(env->NewObject(cls_String, String_new, jBuff, UTF8_STRING));

    env->DeleteLocalRef(jBuff);
    env->DeleteLocalRef(UTF8_STRING);
    env->DeleteLocalRef(cls_String);
    return ret;
}
slouken commented 3 weeks ago

I haven't investigated this, but we might be able to use iconv on Android to do this for us:

#define JavaToUTF8(S) SDL_iconv_string("UTF-8", "JAVA", (const char *)(S), (SDL_strlen(S) + 1))
#define UTF8ToJava(S) SDL_iconv_string("JAVA", "UTF-8", (const char *)(S), SDL_strlen(S) + 1)

Although that probably encodes the null byte at the end as 0xC0,0x80, so we'd need to track the length separately.