square / okio

A modern I/O library for Android, Java, and Kotlin Multiplatform.
https://square.github.io/okio/
Apache License 2.0
8.8k stars 1.18k forks source link

Windows (native build) codepage issues #1473

Open Nik-mmzd opened 6 months ago

Nik-mmzd commented 6 months ago

Example code:

# main.kt
import okio.*
import okio.Path.Companion.toPath

fun main(args: Array<String>) {
    FileSystem.SYSTEM.createDirectory("тестовая строка".toPath())
}
# build.gradle.kts
plugins {
    kotlin("multiplatform") version "1.9.23"
}

repositories {
    mavenCentral()
}

kotlin {
    mingwX64 {
        binaries {
            executable()
        }
    }

    sourceSets {
        commonMain {
            dependencies {
                implementation("com.squareup.okio:okio:3.9.0")
            }
        }
    }
}

Provided code creates directory named тестовая строка (cp1251 represenation of utf-8 string) instead of тестовая строка.

This applies to other functions as well: FileSystem.SYSTEM.exists(), FileSystem.SYSTEM.write(), etc.

Tested on Windows 11 and Windows 10. "Unicode (beta)" in "Administrative Locale Settings" should be off. When "Unicode beta" in "Administrative Locale Settings" is on, the issue is not reproducible.

swankjesse commented 6 months ago

Thanks for reporting this.

I think there’s likely two possible fixes; one where we transcode the native path to UTF-8 first, and one where we don’t do that.

I’m curious what happens for Java, which uses UTF-16 strings for paths. It might just work because it does string encoding before it makes system calls.

Nik-mmzd commented 6 months ago

Windows, same code, JVM target, JVM 17, all seems to be OK

Nik-mmzd commented 6 months ago

I think I should note that cp1251 is locale-dependent codepage and may be changed by user in "Administrative Locale Settings". Current codepage can be retrieved using GetACP call.

fzhinkin commented 2 months ago

I’m curious what happens for Java, which uses UTF-16 strings for paths. It might just work because it does string encoding before it makes system calls.

Java uses API that accepts UTF-16 strings (LPCWSTR) as paths (CreateFileW, DeleteFileW, etc.): https://github.com/openjdk/jdk/blob/master/src/java.base/windows/native/libnio/fs/WindowsNativeDispatcher.c