zaphoyd / websocketpp

C++ websocket client/server library
http://www.zaphoyd.com/websocketpp
Other
6.99k stars 1.97k forks source link

File descriptor leak in websocket client. #940

Open AlastairGrowcott opened 3 years ago

AlastairGrowcott commented 3 years ago

I am using version WebSocket++ version 0.8.1 and ASIO 1.12.2. I will try to move forward to the latest, but it is likely to be a lot of work if any APIs have changed.

I have a test harness that uses a lot of web socket clients (websocketpp::client<websocketpp::config::asio_client>). Each test opens its own set of web sockets, and when testing initial connection functionality may open the same web socket multiple times.

I have wrapped WebSocket++ in a simple C++ class to simplify the interface. The Start() function looks like:

    m_impl->m_connected = false;
    m_impl->m_errored = false;

    m_impl->m_client.set_open_handler(std::bind(&on_open,
                                                 m_impl,
                                                 std::placeholders::_1));
    m_impl->m_client.set_fail_handler(std::bind(&on_fail,
                                                 m_impl,
                                                 std::placeholders::_1));
    m_impl->m_client.set_message_handler(std::bind(&on_message,
                                                   m_impl,
                                                   std::placeholders::_1,
                                                   std::placeholders::_2));
    m_impl->m_client.clear_access_channels(websocketpp::log::alevel::frame_header);
    m_impl->m_client.clear_access_channels(websocketpp::log::alevel::frame_payload);
    m_impl->m_connection = m_impl->m_client.get_connection(url, ec);
    if (ec || (nullptr == m_impl->m_connection)) {
        USER_ERROR("Failed to get web socket connection.");
        result++;
    } else {
        m_impl->m_client.connect(m_impl->m_connection);
        m_impl->m_thread = new std::thread(std::bind(&Client::run, &(m_impl->m_client)));

        while (0u == result) {
            std::unique_lock<std::mutex> lock(m_impl->m_lock);

            m_impl->m_cond.wait(lock);
            if (m_impl->m_errored) {
                USER_ERROR("Error connecting to web socket.");
                result++;
            } else {
                if (m_impl->m_connected) {
                    break;
                }
            }
        }

The Stop() function, which is called whenever my class is destroyed, looks like:

websocketpp::lib::error_code  ec;

if (nullptr != m_impl->m_thread) {
    m_impl->m_client.close(m_impl->m_connection->get_handle(), websocketpp::close::status::normal, "", ec);
    m_impl->m_thread->join();
    delete m_impl->m_thread;
    m_impl->m_thread = nullptr;
    m_impl->m_client.reset();
}

My class is always created on the stack, and so will always be freed when the relevant function returns, triggering a call to this Stop() function. I often pass an instance of the class by reference to other sub-functions, but never as a copy and never as a pointer (I searched my code base to prove this).

Occasionally (very rarely but almost always in the same place) stopping a websocket client throws a warning: got non-close frame while closing.

After adding a few more tests I now get an exception as follows:

terminate called after throwing an instance of 'std::system_error'
  what():  eventfd_select_interrupter: Too many open files

This looks like a file descriptor leak somewhere in WebSocket++ or ASIO. I find the source code incredibly hard to read. I am not a C++ expert, just highly proficient, and a lot of "modern practices" just confuse me. I also find the documentation quite hard to follow. So it is possible that there is an error in my code. However it may also be a file descriptor leak in WebSocket++/ASIO and I have no idea how to tell which it is. I cannot even figure out how to get a count of file descriptors in use by WebSocket++.

My code does almost no file IO, and opens a maximum of 10 sockets for its own use. This number never increases once it reaches ten (as far as I can tell). The sockets I create are used for the lifetime of the process with only a very few open/close cycles for testing purposes. So if there was a leak of file descriptors in other code, it would be numbered in tens at the very most.

AlastairGrowcott commented 3 years ago

Also, and this might be the cause of the got non-close frame while closing warning, but receiving messages from the web socket client gets slower. This may be due to processing an ever growing list of file descriptors.

winsnlife commented 3 years ago

I also had this problem. In fact, I call it normally websocketClient.close (metadata it->second->get HDL (), code, reason, EC); to close. However, from lsof - P PID | grep scok, every time the program runs, a sock record is added.