stephenh / mirror

A tool for real-time, two-way sync for remote (e.g. desktop/laptop) development
Apache License 2.0
391 stars 35 forks source link

Don't drop non-utf8 file paths #20

Open fezzzza opened 5 years ago

fezzzza commented 5 years ago

I can get mirror working fine as a client without watchman, but with watchman installed I get this: I am running linux mint 19 (~ubuntu 18 bionic). Same result whether running as user or root Same result with whichever version of openjdk-8/9/10/11-jre A quick google and it appears to be related to character encodings. It may help to mention that I am in the UK and most of my system defaults to UTF-8, but it may be related to some form of internationalisation. With reference to your notes about WatchService, I notice that JDK-8145981 is now fixed - is WatchService still considered buggy in the latest release and is watchman still recommended/required for stability?

$mirror client -h localhost -l /var/www/html -r /var/www/html 2018-10-28 16:15:39 INFO Connected, starting session, version unspecified 2018-10-28 16:15:41 INFO Watchman root is /var/www/html 2018-10-28 16:15:41 ERROR Exception starting the client java.nio.charset.MalformedInputException: Input length = 1 at java.nio.charset.CoderResult.throwException(CoderResult.java:281) at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:816) at com.facebook.buck.bser.BserDeserializer.deserializeString(BserDeserializer.java:236) at com.facebook.buck.bser.BserDeserializer.deserializeRecursiveWithType(BserDeserializer.java:332) at com.facebook.buck.bser.BserDeserializer.deserializeTemplate(BserDeserializer.java:302) at com.facebook.buck.bser.BserDeserializer.deserializeRecursiveWithType(BserDeserializer.java:338) at com.facebook.buck.bser.BserDeserializer.deserializeRecursive(BserDeserializer.java:313) at com.facebook.buck.bser.BserDeserializer.deserializeObject(BserDeserializer.java:276) at com.facebook.buck.bser.BserDeserializer.deserializeRecursiveWithType(BserDeserializer.java:336) at com.facebook.buck.bser.BserDeserializer.deserializeRecursive(BserDeserializer.java:313) at com.facebook.buck.bser.BserDeserializer.deserializeBserValue(BserDeserializer.java:113) at mirror.watchman.WatchmanChannelImpl.read(WatchmanChannelImpl.java:93) at mirror.watchman.WatchmanChannelImpl.query(WatchmanChannelImpl.java:87) at mirror.watchman.WatchmanFileWatcher.startWatchAndInitialFind(WatchmanFileWatcher.java:197) at mirror.watchman.WatchmanFileWatcher.performInitialScan(WatchmanFileWatcher.java:140) at mirror.MirrorSession.calcInitialState(MirrorSession.java:78) at mirror.MirrorClient.startSession(MirrorClient.java:88) at mirror.MirrorClient.access$300(MirrorClient.java:27) at mirror.MirrorClient$SessionStarter.runOneLoop(MirrorClient.java:198) at mirror.tasks.ThreadBasedTask.run(ThreadBasedTask.java:62) at mirror.tasks.ThreadBasedTask.lambda$new$0(ThreadBasedTask.java:39) at java.lang.Thread.run(Thread.java:748) 2018-10-28 16:15:41 INFO Stopping session

stephenh commented 5 years ago

Oh, yes, this is from getting non-UTF8 paths. I ran into this myself but hadn't released the "fix". If you bump to 1.2.1, which I just pushed, it should not blow up.

The disclaimer is that I wasn't sure how to fix it, so for now when watchman says "um, this file path can't be decoded as utf-8", mirror just skips it and does not sync that path.

I guess in theory it could transfer the file path as binary (just a byte[]) across the wire ... however all of the Java file system APIs take strings, so once the remote side got it, there is not a (standard) Java API that would accept it. I'd have to do something janky like save it to a temp file (via the Java APIs) and then use a JNI call/something to rename it.

In my case, these were corrupted file paths, so I used env LC_ALL=C find . -name '*[! -~]*' to find them and delete them. But I suppose for you they are real files...

I'll leave this issue open as "somehow support non-utf8 file names in a way that is not dropping them".

fezzzza commented 5 years ago

Ah yes, just to confirm, there are a bunch of image files of international flags that have accented characters in the filenames - that's the way they came from the source - I certainly wouldn't have chosen to use complex characters in the filenames and I've seen it documented that it's not a good idea - but I wouldn't know how to check whether they are UTF-8 or an international ISO like ISO-8859-1 or some other.