Open pfeatherstone opened 4 years ago
So i now manually keep track of websocket connections, and before i call stop(), i iterate through all the connections, and call close(). Now it turns out the handler that is dispatched at line 6898 in crow_all.h never gets called. If you do a bit more tracking it goes into boost.asio, at which point I loose track of the stack trace. Can anybody help? I know this repo is dead, but just checking in case some users have encountered similar problems.
I'll try to look into it if i have time. if you can provide some minimal example where the issue appears it would be great.
Don't worry. I use a combination of https://github.com/yhirose/cpp-httplib and https://github.com/zaphoyd/websocketpp.git. I also sometimes use https://github.com/civetweb/civetweb though i get hanging issues with the last one.
So I looked through the code (I know you said not to worry), It seems that stop()
and close()
do completely different things.
close()
sends a close signal and once a confirmation is received, it closes the socket adaptor.
stop()
on the other hand shuts down the IO services that all sockets use (HTTP request sockets or WebSockets). And while that works does work, the client just never receives any notice.
This is indeed a problem if you stop()
your server and restart it afterwards without much waiting. It will then fail to start with an exception:
terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::system::system_error> >'
what(): bind: Address already in use
Calling lsof
gives you the reason:
$ lsof -i :18080
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
my-server 23405 jf99 44u IPv4 149122 0t0 TCP localhost:18080->localhost:40806 (CLOSE_WAIT)
This is with reuse_addr = true
in the acceptor's constructor (line 9269 in crow_all.h).
From what I've seen in the code (and I might be totally wrong), the primary issue is that Crow doesn't keep track of connections, (with the exception of HTTP connections in debug mode).
And the only solution I can think of is what @pfeatherstone was doing but in crow itself, which is keep track of all active connections and send a close signal right before stopping the server as part of stop()
.
This is indeed a problem if you
stop()
your server and restart it afterwards without much waiting. It will then fail to start with an exception:terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::system::system_error> >' what(): bind: Address already in use
Calling
lsof
gives you the reason:$ lsof -i :18080 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME my-server 23405 jf99 44u IPv4 149122 0t0 TCP localhost:18080->localhost:40806 (CLOSE_WAIT)
This is with
reuse_addr = true
in the acceptor's constructor (line 9269 in crow_all.h).
As far as I remember there're OS settings which affect the CLOSE_WAIT timeouts e.g. net.ipv4.tcp_fin_timeout and tcp_keepalive_time etc. so it might not always be the case if you tweak your OS for high concurrency and load - to keep these sockfds closed in a timeley manner. TYVM
Like everything else in boost, it seems boost.asio is more complicated than what it needs to be.
Calling stop() does not properly close all active connections, namely websocket connections.