stephenh / mirror

A tool for real-time, two-way sync for remote (e.g. desktop/laptop) development
Apache License 2.0
391 stars 35 forks source link

grpc DEADLINE_EXCEDDED error when syncing #2

Closed qiaojianjack closed 7 years ago

qiaojianjack commented 7 years ago

Hi,

I followed the guideline in the doc this morning to set up mirror, but seems that the client just wait for several minutes before quitting with the following error:

$ ./mirror client -h <my server address> -l ./sync_workspace/ -r ./sync_workspace/
2017-05-01 11:02:44 INFO   Increasing file limit to 9223372036854775807
2017-05-01 11:02:45 INFO   Connected, starting session, version unspecified
2017-05-01 11:02:45 INFO   Client has 1 paths
2017-05-01 11:05:45 SEVERE Error from incoming server stream
io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED
    at io.grpc.Status.asRuntimeException(Status.java:540)
    at io.grpc.stub.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:392)
    at io.grpc.internal.ClientCallImpl.closeObserver(ClientCallImpl.java:426)
    at io.grpc.internal.ClientCallImpl.access$100(ClientCallImpl.java:76)
    at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:512)
    at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl.access$700(ClientCallImpl.java:429)
    at io.grpc.internal.ClientCallImpl$ClientStreamListenerImpl$1StreamClosed.runInContext(ClientCallImpl.java:544)
    at io.grpc.internal.ContextRunnable.run(ContextRunnable.java:52)
    at io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:117)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

2017-05-01 11:05:45 INFO   Stopping session
2017-05-01 11:05:45 SEVERE Error returned from runOneLoop
java.lang.RuntimeException: java.io.IOException: Bad file descriptor
    at mirror.watchman.WatchmanFileWatcher.runOneLoop(WatchmanFileWatcher.java:102)
    at mirror.tasks.ThreadBasedTask.run(ThreadBasedTask.java:59)
    at mirror.tasks.ThreadBasedTask.lambda$new$0(ThreadBasedTask.java:37)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Bad file descriptor
    at jnr.enxio.channels.NativeSocketChannel.read(NativeSocketChannel.java:80)
    at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:59)
    at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:109)
    at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
    at com.google.common.io.ByteStreams.read(ByteStreams.java:822)
    at com.facebook.buck.bser.BserDeserializer.readBserBuffer(BserDeserializer.java:123)
    at com.facebook.buck.bser.BserDeserializer.deserializeBserValue(BserDeserializer.java:113)
    at mirror.watchman.WatchmanChannelImpl.read(WatchmanChannelImpl.java:92)
    at mirror.watchman.WatchmanFileWatcher.runOneLoop(WatchmanFileWatcher.java:98)
    ... 3 more

and on my remote dev machine, it had the following log before quitting:

$ ./mirror server
2017-05-01 11:02:26 INFO   Listening on 49172, version 1.0.8
2017-05-01 11:02:46 INFO   Starting new session 1 for + /home/<myusername>/./sync_workspace
^C2017-05-01 11:07:56 INFO   Watchman not found, using WatchService instead
2017-05-01 11:07:56 INFO     Note that WatchService is buggy on Linux, and uses polling on Mac.

Any idea on what might be wrong? I think it might be related to watchman, but I have the latest version of watchman (v4.7.0) installed both on my laptop and remote dev machine.

P.S. I masked my server info and username, but we are from the same company. Really like this tool, very helpful!

stephenh commented 7 years ago

Hey, thanks for reaching out!

I was initially thinking maybe your server's sync_workspace had a ton of files in it, so was taking longer than ~3 minutes to respond, e.g. there are 3 minutes between these two lines:

2017-05-01 11:02:45 INFO   Client has 1 paths
2017-05-01 11:05:45 SEVERE Error from incoming server stream
io.grpc.StatusRuntimeException: DEADLINE_EXCEEDED

But there being 5 minutes between these two lines is really odd:

2017-05-01 11:02:46 INFO   Starting new session 1 for + /home/<myusername>/./sync_workspace
^C2017-05-01 11:07:56 INFO   Watchman not found, using WatchService instead

Is that control-c from you hitting control-c to kill the server? Which is fine, but it's very curious it just hung between the "Starting new session" and "watchman not found" ... let me poke around.

stephenh commented 7 years ago

Can you run watchman get-sockname on your remote dev machine and see what it outputs?

qiaojianjack commented 7 years ago

Ah yeah the last 2 lines appeared after I hit control-C to try to kill the server.

When I first encountered this error there's only 1 directory in sync_workspace and mirror says "Client has 545 paths", but after I removed that directory from my laptop (directory and file are still there on the remote machine, essentially trying to re-sync), I still see the error above.

And when I run watchman get-sockname on my remote dev machine, it just hangs there with no output...weird.

stephenh commented 7 years ago

And when I run watchman get-sockname on my remote dev machine, it just hangs there with no output...weird.

Okay, this is what's causing the error then...mine returns right away with:

$ watchman get-sockname
{
    "version": "4.7.0",
    "sockname": "/usr/local/var/run/watchman/shaberma-state/sock"
}

I'll look around at why this might be happening; in theory watchman should at least error out right away, e.g. with a -1, so that mirror can continue and fallback on the Java 8 WatchService.

stephenh commented 7 years ago

Hm, sounds like there was a known issue when installing/upgrading watchman via brew, but given this happens on your remote machine, I assume that's Linux:

https://github.com/facebook/watchman/issues/371

Maybe do a ps aux | grep watchman, see if there is a watchman daemon running, if so do a killall watchman to kill it, and then see if watchman get-sockname works.

qiaojianjack commented 7 years ago

There are indeed several processes of watchman get-sockname hanging there. I killed all of them and then watchman get-sockname worked, as well as mirror.

Thanks a lot for helping! Btw, I'm definitely interested in making this tool better, but I feel I'm not very familiar with network applications, etc. Where could I start to take a look?

stephenh commented 7 years ago

Great, I'm glad that was it, because otherwise I would have been out of ideas.

Awesome to hear you'd like to help out, that would be really great.

In terms of where to start, the guts of the "am I out of sync?" logic is UpdateTree and UpdateTreeDiff and tests. The network stuff is mostly in MirrorClient/MirrorServer.

Here is a presentation I did for a local user's group a few months back, it would also be a good intro:

https://docs.google.com/presentation/d/1PQ1LH13rxTQhiyy1DVV4AfJ7-i-WIZB0je13MK2NAW8/edit#slide=id.p

stephenh commented 7 years ago

FWIW, if you wanted to try and write some code that detected what you ran in to, e.g. somewhere in WatchmanChannelImpl.createIfAvailable try and run like ps aux | grep watchman (not sure the best way to do that) and if there are N > 1 watchmans around, do either a log.warn/log.error/fail to let the user know, that would be pretty handy.

(I have no idea how many people have run into the watchman bug that you did; you might be the only one, or you might have been the only one to actually file an issue about it. :-))

stephenh commented 7 years ago

Huh, looks like this might be easy to drop in:

https://github.com/profesorfalken/jProcesses