stephenh / mirror

A tool for real-time, two-way sync for remote (e.g. desktop/laptop) development
Apache License 2.0
391 stars 37 forks source link

Issues with international characters #42

Open jacobalberty opened 4 years ago

jacobalberty commented 4 years ago

Just fired up the docker image and I'm getting this error over and over. I removed those files and things seem to be working fine now

mirror_1  | java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: <clipped for privacy>/Fran?ais/timeQplus Guide de D?marrage Rapide (2016_09_12 16_41_37 UTC).pdf
mirror_1  |     at sun.nio.fs.UnixPath.encode(
mirror_1  |     at sun.nio.fs.UnixPath.<init>(
mirror_1  |     at sun.nio.fs.UnixFileSystem.getPath(
mirror_1  |     at java.nio.file.Paths.get(
mirror_1  |     at mirror.UpdateTree.find(
mirror_1  |     at mirror.UpdateTree.addUpdate(
mirror_1  |     at mirror.UpdateTree.addLocal(
mirror_1  |     at mirror.MirrorSession.lambda$calcInitialState$1(
mirror_1  |     at java.util.ArrayList.forEach(
mirror_1  |     at mirror.MirrorSession.calcInitialState(
mirror_1  |     at mirror.MirrorClient.startSession(
mirror_1  |     at mirror.MirrorClient.access$300(
mirror_1  |     at mirror.MirrorClient$SessionStarter.runOneLoop(
mirror_1  |     at
mirror_1  |     at mirror.tasks.ThreadBasedTask.lambda$new$0(
mirror_1  |     at
mirror_1  | 2019-09-05 03:35:21 INFO  Stopping session
mirror_1  | 2019-09-05 03:35:21 INFO  Connected, starting session, version unspecified
mirror_1  | 2019-09-05 03:35:21 INFO  Watchman root is /data/
stephenh commented 4 years ago

Hm, yeah, I'm not entirely surprised...I've had some issues reported in the past with this.

The core issue is that mirror uses watchman for file watching, which is a C library that uses POSIX APIs that are not UTF-8, they use whatever encoding the file system happens to use.

But for everything else that is not-file-watching, mirror uses the regular Java APIs, which assumed UTF-8.

So for vanilla strings that look the same in POSIX (via watchman via JNI) & UTF-8, everything is fine by happenstance.

Just for reference, I've tried to hack around some of this by just ignoring "can't be UTF-8" string failures from watchman's Java library:

But for some reason your paths "made it past" the original POSIX -> watchman -> JNI -> mirror hop, and only failed when mirror then tried to send that thought-it-was-UTF-8 string back into Java's own "utf back to native" layer.

I really don't have any good ideas and realistically won't dive deeper on this at the moment.

If you can get a test case written that somehow creates a file path in git / the PR that exhibits the ^ failure, that'd be great.