zaphoyd / websocketpp

C++ websocket client/server library
http://www.zaphoyd.com/websocketpp
Other
7.05k stars 1.97k forks source link

`stop_listening()` causes errors and leaves the port used #920

Open aclex opened 4 years ago

aclex commented 4 years ago

At first, let me please use a chance to say big thanks to the authors for such a great library, really enjoyed exploring its design, looking into the code and using it in my application! Second, according to my experiments, the described problem apparently not caused by the code of websocketpp itself, but rather lies somewhere deep in Boost.asio. Third, to keep it as simple, as possible, I've managed to reproduce it with any program from examples, which have stop_listening() call. I experiment with external_storage example here.

The problem is that after the call to stop_listening() on server endpoint the following messages are printed:

[2020-08-12 10:34:49] [connect] WebSocket Connection [::ffff:127.0.0.1]:49240 v13 "Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0" / 101
on_message called with hdl: 0x5580981477c0 and message: stop-listening
[2020-08-12 10:34:49] [info] Error getting remote endpoint: system:9 (Bad file descriptor)
[2020-08-12 10:34:49] [fail] WebSocket Connection Unknown - "" - 0 websocketpp:26 Operation canceled
[2020-08-12 10:34:49] [info] asio async_shutdown error: system:9 (Bad file descriptor)
[2020-08-12 10:34:49] [info] handle_accept error: Operation canceled
[2020-08-12 10:34:49] [info] Stopping acceptance of new connections because the underlying transport is no longer listening.

and the listened port is left used both after this call and even during 1-2 minutes after the application exit (obviously until the kernel does its cleaning up).

Here's the output of the very next start of the same utility within that period of time:

[2020-08-12 10:35:17] [info] asio listen error: system:98 (Address already in use)
terminate called after throwing an instance of 'websocketpp::exception'
  what():  Underlying Transport Error
Aborted

Here the stop_listening() is triggered by the special message from the client, but in fact the behaviour is just the same even without any client connections opened during the session (e.g. when triggered by UNIX signal). I'm also aware of the neat server closing procedure mentioned in the FAQ, it's basically followed in the example utilities, but the problem appears right in the call to stop_listening(), in m_acceptor.close() call, as far, as I can tell.

First I used to hope the problem is with the endpoint temporary object created inside listen() overloads and passed to the acceptor to bind and listen (there are recommendations it should have its lifetime longer than the listening cycle of the acceptor), but trying to pass my own endpoint object with eternal lifetime didn't really help. According to traces concerning this error here and there, it seems to close some already closed socket, but given the port is left used, it also doesn't close the socket it used for listening. My assumption is that the problem somewhere in the accessor code in Boost.asio, but unfortunately I'm no expert with this, so don't really have a clue on where to go debugging it.

Again, not going to complain on any error in websocketpp itself, it obviously following the procedure of use, but just to share the problem as maybe someone else could also come across with it.

Reproducible for me on master, 0.8.2 and develop versions of websocketpp and Boost version 1.72.0.

zaphoyd commented 4 years ago

Hi @aclex, glad you are enjoying the library and sorry to hear that you are having trouble with parts of it. A few notes:

The info channel warnings you are seeing when listening stops are expected. WebSocket++ creates a new connection and then asks the accepter to accept connections into it. If the accepter is cancelled (as happens in stop_listening) this "next connection" object gets cleaned up and technically "fails to connect because of cancellation". The bad file descriptor is that the connection failure logger tries to get the address of the connection that failed and fails because it never actually connected. The library knows this isn't really a problem and flags these errors as "info" instead of warning or error, but a less confusing solution would probably be to drop some of those messages down to debug or suppress them entirely. It isn't that useful to know that a file descriptor that you know is bad is bad.

Regarding the listening socket not actually being cleaned up properly, that does not sound like the intended behavior and is something that I'll take a look at more closely. One thing that would help me is to know what operating system you were able to reproduce this on?

If you need to work around the issue in the meantime, and are not already familiar with it, the SO_REUSE_ADDR socket option, which explicitly instructs the OS to allow quickly re-using a listening socket, might be worth a look. As far as I understand, it should not be necessary to use SO_REUSE_ADDR to restart quickly after a stop_listening operation cleanly executes.

aclex commented 4 years ago

@zaphoyd many thanks for the clarification of log messages! That's indeed a complex structure and not always possible to build the destruction in the lower end library concept neat and cleanly. No problem with this.

Sorry for not mentioning my OS initially, wanted to put too much details to my message, so some did lost :) I'm experimenting with GNU/Linux, Gentoo, x86_64 with kernel version of 5.4.48 and Glibc 2.30.

I don't know if it helps, by my digging exercises finished somewhere inside the Boost, in the boost/asio/detail/impl/reactive_socket_service_base.ipp:120, where the closing of underlying things of acceptor seem to be placed, and the message in the comments reads the following:

  // The descriptor is closed by the OS even if close() returns an error.
  //
  // (Actually, POSIX says the state of the descriptor is unspecified. On
  // Linux the descriptor is apparently closed anyway; e.g. see
  //   http://lkml.org/lkml/2005/9/10/129
  // We'll just have to assume that other OSes follow the same behaviour. The
  // known exception is when Windows's closesocket() function fails with
  // WSAEWOULDBLOCK, but this case is handled inside socket_ops::close().

Maybe I'm wrong, but it sounds to me that there's some assumption about the underlying kernel behaviour, which might not true for my case.

As for the workaround, yes, generally I can go with some delay until the port is free again, as it's the server application for me now, which not going to be restarted too often, but port re-using is also a way to go.

spdev31 commented 3 years ago

Hello. New user here. I'll add that I see this issue as well with an embedded Linux distribution using Linux kernel 4.4.1 and Glibc 2.21. I'll share what info I have. Issue also occurs without Boost.Asio, when using Asio via the ASIO_STANDALONE macro. Confirmed the above SO_REUSEADDR workaround by adding a call to set_reuse_addr (true).

Pure speculation here, but possibly adding a shutdown(SHUT_RDWR) before close() on the underlying socket may help.

hamitzor commented 3 years ago

I also have the same issue. Even though I use stop_listening and stop_perpetual, and close all connections with close individually, I get port in use error when I try to restart the server.

humatic commented 2 years ago

I experience this also.

I've read the FAQ and been careful to close any existing client connections and call stop_listening. Here's a snippet:

// connections_ is a set of connection_hdl refs
if (server_.is_listening())
  server_.stop_listening();
for (auto hdl : connections_)
  server_.close(hdl, websocketpp::close::status::going_away, "shutting down");
connections_.clear();

Yes, using server_.set_reuse_addr(true); is an effective workaround, but I feel like it should be possible to release the socket properly at shutdown. Any new info on this topic? Thanks!

SolomidHero commented 1 year ago

I have almost same code snippet with server_.set_reuse_addr(true) with single handler:

server_.stop_perpetual();
if (server_.is_listening()) {
    server_.stop_listening();
}
if (!hdl_.expired()) {
    server_.close(hdl_, websocketpp::close::status::going_away, "shutting down");
}

and get asio errors:

[2023-09-21 13:24:27] [info] asio handle_accept error: asio.system:89 (Operation aborted.)
[2023-09-21 13:24:27] [info] Error getting remote endpoint: asio.system:9 (Bad file descriptor)
INFO :: WebSocket Connection Unknown - "" - 0 asio.system:89 Operation aborted.
[2023-09-21 13:24:27] [info] asio async_shutdown error: asio.system:9 (Bad file descriptor)
ERR :: handle_accept error: Operation aborted. # got from .run() thread by try/catch block
INFO :: Stopping acceptance of new connections because the underlying transport is no longer listening.

Some of logs are from websocket which I provided custom logger. In my setup:

ssszzh commented 1 year ago

I encountered the same problem, which resulted in the same error after execution. However, when I tried to execute the listen method, there was no error reported, but when I executed the run method, the run method exited directly. How should this problem be solved?

AnInteger commented 10 months ago

我认为如果想要停止websocketpp::server,你确实需要先调用server.stop_listening()和stop()。 不过我们先明确为什么需要正常停止,如果你是想更换端口并重新启用服务的话,你应该使用websocketpp::server的局部变量,我并不知道在websocketpp::server的析构函数中对server对象有什么处理,不过确实起作用了。 由于server.run()会阻塞(实际上应该创建了新的线程),所以你需要使用另一个线程来控制server的停止和重新启动。

I think if you want to stop websocketpp::server, you really need to call server.stop_listening() and stop() first. But let’s first clarify why we need to stop it normally. If you want to change the port and restart the service, you should use the local variables of websocketpp::server. I don’t know that there is anything about the server object in the destructor of websocketpp::server. But it does work. Since server.run() blocks (it should actually create a new thread), you need to use another thread to control stopping and restarting the server. --Google Translate