Open sirmonin opened 3 years ago
Thank you for reporting this @sirmonin!
Could be related to https://github.com/yfinkelstein/node-zookeeper/issues/238
Hi @sirmonin!
To find out what could be wrong:
@DavidVujic, thanks for a quick reply!
node-zookeeper: Tested both on 4.6.0 and 4.7.1
Zookeeper: I am using Docker image bitnami/zookeeper:latest
which is 3.6.2
Great, thanks!
@DavidVujic It is also observed, that timeout with crashing happens while application is running under some load.
Thanks!
I'm planning to look at this later this week.
Notes:
I think there is need for some extra uv_closing
(or something similar) checks here: https://github.com/yfinkelstein/node-zookeeper/blob/master/src/node-zk.cpp#L265
As hinted in this StackOverflow post: https://stackoverflow.com/questions/47522854/main-src-unix-core-c117-uv-close-assertion-uv-is-closinghandle-failed
If you, or if you know someone with C and/or C++ skills, please feel free to jump in and help 😄 It is much appreciated.
Just a short update: I'm working on this issue and am trying different approaches on how to solve it. I also believe I can reproduce the assertion error using the code in the examples
folder and restart a docker container with a zookeeper server running.
Hello again @sirmonin!
I've been working on solving this issue, but cannot really find where and how to prevent the assertion error to happen.
What I've found out is that it most likely is when watchers are attached. Earlier today I pushed new versions of the code in the examples
folder, making sure only one client is used, and function calls is made only when the client is connected.
And that has made a difference: on Mac OS X I get no assertion error. But on Linux I get the error - only occasionally. Haven't found a pattern.
The way I've been doing is:
starting a zookeeper server hosted in a docker container on my local machine.
starting the examples with node examples/index.js
running docker restart <container-id>
Before the changes in the example code, I got assertion error every time. Now, only randomly it seems and only on Linux.
I have never got the error you are describing (with short timeouts).
Could you help me by sharing another code example, or if you can try the example code described above?
Here's a diff of the changes I have made: https://github.com/yfinkelstein/node-zookeeper/pull/264/files
Hi @DavidVujic! Thanks a lot for time and effort you've spent on this issue.
My zookeeper connect method looks like this. Timeout is set to 5ms to simulate timeout error.
public connect = () => {
const host = this.config.get('ZOOKEEPER_HOST');
const port = this.config.get('ZOOKEEPER_PORT');
this.zkConfig = {
connect: `${host}:${port}`,
timeout: 5,
debug_level: ZooKeeper.ZOO_LOG_LEVEL_DEBUG,
host_order_deterministic: false,
};
this.client = new ZooKeeper(this.zkConfig);
this.client.init(this.zkConfig);
};
This produces error logs:
2021-02-01 17:34:32,821:139:ZOO_INFO@log_env@1098: Client environment:zookeeper.version=zookeeper C client 3.5.5
2021-02-01 17:34:32,821:139:ZOO_INFO@log_env@1102: Client environment:host.name=test
2021-02-01 17:34:32,821:139:ZOO_INFO@log_env@1109: Client environment:os.name=Linux
2021-02-01 17:34:32,821:139:ZOO_INFO@log_env@1110: Client environment:os.arch=4.19.121-linuxkit
2021-02-01 17:34:32,821:139:ZOO_INFO@log_env@1111: Client environment:os.version=#1 SMP Tue Dec 1 17:50:32 UTC 2020
2021-02-01 17:34:32,821:139:ZOO_INFO@log_env@1119: Client environment:user.name=(null)
2021-02-01 17:34:32,821:139:ZOO_INFO@log_env@1127: Client environment:user.home=/home/node
2021-02-01 17:34:32,821:139:ZOO_INFO@log_env@1139: Client environment:user.dir=/srv
2021-02-01 17:34:32,821:139:ZOO_INFO@zookeeper_init_internal@1173: Initiating client connection, host=zookeeper:2181 sessionTimeout=5 watcher=0x7f365d600480 sessionId=0 sessionPasswd=<null> context=0x55d714f278a0 flags=0
Interest in (fd=19, read=true, write=true, timeout=1)2021-02-01 17:34:32,823:139:ZOO_DEBUG@get_next_server_in_reconfig@1326: [OLD] count=0 capacity=0 next=0 hasnext=0
2021-02-01 17:34:32,823:139:ZOO_DEBUG@get_next_server_in_reconfig@1329: [NEW] count=1 capacity=16 next=0 hasnext=1
2021-02-01 17:34:32,823:139:ZOO_DEBUG@get_next_server_in_reconfig@1340: Using next from NEW=172.18.0.2:2181
2021-02-01 17:34:32,823:139:ZOO_DEBUG@zookeeper_connect@2228: [zk] connect()
zk_timer_cb firedping timer went off2021-02-01 17:34:32,832:139:ZOO_WARN@zookeeper_interest@2280: Exceeded deadline by 9ms
yield:zookeeper_interest returned error: -7 - operation timeout
2021-02-01 17:34:32,833:139:ZOO_ERROR@handle_socket_error_msg@2382: Socket [172.18.0.2:2181] zk retcode=-7, errno=110(Operation timed out): connection to 172.18.0.2:2181 timed out (exceeded timeout by 7ms)
2021-02-01 17:34:32,833:139:ZOO_DEBUG@cleanup@1781: Previous connection=[172.18.0.2:2181] delay=0
zk_timer_cb firedping timer went off2021-02-01 17:34:32,834:139:ZOO_WARN@zookeeper_interest@2280: Exceeded deadline by 11ms
2021-02-01 17:34:32,835:139:ZOO_DEBUG@get_next_server_in_reconfig@1326: [OLD] count=0 capacity=0 next=0 hasnext=0
2021-02-01 17:34:32,835:139:ZOO_DEBUG@get_next_server_in_reconfig@1329: [NEW] count=1 capacity=16 next=1 hasnext=0
2021-02-01 17:34:32,836:139:ZOO_DEBUG@get_next_server_in_reconfig@1351: Failed to find either new or old
2021-02-01 17:34:32,836:139:ZOO_DEBUG@zookeeper_connect@2228: [zk] connect()
Interest in (fd=20, read=true, write=true, timeout=1)Assertion failed: loop->watchers[w->fd] == w (../deps/uv/src/unix/core.c: uv__io_stop: 932)
However, adding a setTimeout seemed to help a bit. App is no longer crashing and tries to reconnect.
public connect = () => {
const host = this.config.get('ZOOKEEPER_HOST');
const port = this.config.get('ZOOKEEPER_PORT');
this.zkConfig = {
connect: `${host}:${port}`,
timeout: 5,
debug_level: ZooKeeper.ZOO_LOG_LEVEL_DEBUG,
host_order_deterministic: false,
};
this.client = new ZooKeeper(this.zkConfig);
setTimeout(() => {
this.client.init(this.zkConfig);
}, 1000);
};
Thanks @sirmonin!
I'm closing this one. Please reopen if it still is an issue!
@DavidVujic Issue still persists when server is under high load.
2022-04-29 17:48:16,829:448:ZOO_ERROR@handle_socket_error_msg@2489: Socket 172.19.0.4:2181 zk retcode=-7, errno=110(Operation timed out): connection to 172.19.0.4:2181 timed out (exceeded timeout by 8849ms)
yield:zookeeper_interest returned error: -7 - operation timeout
2022-04-29 17:48:18,481:448:ZOO_WARN@zookeeper_interest@2375: Exceeded deadline by 12182ms
Assertion failed: loop->watchers[w->fd] == w (../deps/uv/src/unix/core.c: uv__io_stop: 930)
Node.js v14.16.1 node-zookeeper v5.4.0
Thank you for notifying. I'll reopen this one!
@DavidVujic I tested under Node v16.14.2. It looks like yield:zookeeper_interest returned error: -7 - operation timeout no longer causes libuv to fail.
Will keep you posted in case issue repeats itself again. For now the issue is solved by updating to Node v16.14.2 or later.
However, timeout issue still persists (eventhough it doesn't crash node anymore)
2022-04-29 20:18:04,369:218:ZOO_ERROR@handle_socket_error_msg@2489: Socket 172.19.0.4:2181 zk retcode=-7, errno=110(Operation timed out): connection to 172.19.0.4:2181 timed out (exceeded timeout by 10530ms)
yield:zookeeper_interest returned error: -7 - operation timeout
Update: Node.js v14.19.1 doesn't crash as well.
Describe the bug If timeout is reached (rc: -7) during the first init() call, I receive the error below and app crashes with code: 139.
I am sure that app doesn't use Zookeeper instance until there is a 'connect' event (which is never received).
To Reproduce 1) Initialize Zookeeper via
.init()
or.connect()
with a timeout set to a tiny value. E.g. 5ms. (This will produce the same effect as on our server, where timeout is set to 10000ms.) 2) See error.Error Log
Expected behavior Timeout doesn't crash node app and closes client connection, so I could recreate client instance and retry to connect.
Desktop (please complete the following information):