stephenh / mirror

A tool for real-time, two-way sync for remote (e.g. desktop/laptop) development
Apache License 2.0
391 stars 37 forks source link

Issues with international characters #42

Open jacobalberty opened 4 years ago

jacobalberty commented 4 years ago

Just fired up the docker image and I'm getting this error over and over. I removed those files and things seem to be working fine now

mirror_1  | java.nio.file.InvalidPathException: Malformed input or input contains unmappable characters: <clipped for privacy>/Fran?ais/timeQplus Guide de D?marrage Rapide (2016_09_12 16_41_37 UTC).pdf
mirror_1  |     at sun.nio.fs.UnixPath.encode(UnixPath.java:147)
mirror_1  |     at sun.nio.fs.UnixPath.<init>(UnixPath.java:71)
mirror_1  |     at sun.nio.fs.UnixFileSystem.getPath(UnixFileSystem.java:281)
mirror_1  |     at java.nio.file.Paths.get(Paths.java:84)
mirror_1  |     at mirror.UpdateTree.find(UpdateTree.java:167)
mirror_1  |     at mirror.UpdateTree.addUpdate(UpdateTree.java:108)
mirror_1  |     at mirror.UpdateTree.addLocal(UpdateTree.java:97)
mirror_1  |     at mirror.MirrorSession.lambda$calcInitialState$1(MirrorSession.java:84)
mirror_1  |     at java.util.ArrayList.forEach(ArrayList.java:1257)
mirror_1  |     at mirror.MirrorSession.calcInitialState(MirrorSession.java:84)
mirror_1  |     at mirror.MirrorClient.startSession(MirrorClient.java:88)
mirror_1  |     at mirror.MirrorClient.access$300(MirrorClient.java:27)
mirror_1  |     at mirror.MirrorClient$SessionStarter.runOneLoop(MirrorClient.java:198)
mirror_1  |     at mirror.tasks.ThreadBasedTask.run(ThreadBasedTask.java:62)
mirror_1  |     at mirror.tasks.ThreadBasedTask.lambda$new$0(ThreadBasedTask.java:39)
mirror_1  |     at java.lang.Thread.run(Thread.java:748)
mirror_1  | 2019-09-05 03:35:21 INFO  Stopping session
mirror_1  | 2019-09-05 03:35:21 INFO  Connected, starting session, version unspecified
mirror_1  | 2019-09-05 03:35:21 INFO  Watchman root is /data/
stephenh commented 4 years ago

Hm, yeah, I'm not entirely surprised...I've had some issues reported in the past with this.

The core issue is that mirror uses watchman for file watching, which is a C library that uses POSIX APIs that are not UTF-8, they use whatever encoding the file system happens to use.

But for everything else that is not-file-watching, mirror uses the regular Java APIs, which assumed UTF-8.

So for vanilla strings that look the same in POSIX (via watchman via JNI) & UTF-8, everything is fine by happenstance.

Just for reference, I've tried to hack around some of this by just ignoring "can't be UTF-8" string failures from watchman's Java library:

https://github.com/stephenh/watchman/commit/55847b54e7e28142fae5460cb9323278b9ccea67

https://github.com/stephenh/mirror/blob/master/src/main/java/mirror/watchman/WatchmanFileWatcher.java#L151

But for some reason your paths "made it past" the original POSIX -> watchman -> JNI -> mirror hop, and only failed when mirror then tried to send that thought-it-was-UTF-8 string back into Java's own "utf back to native" layer.

I really don't have any good ideas and realistically won't dive deeper on this at the moment.

If you can get a test case written that somehow creates a file path in git / the PR that exhibits the ^ failure, that'd be great.