pistacheio / pistache

A high-performance REST toolkit written in C++
https://pistacheio.github.io/pistache/
Apache License 2.0
3.16k stars 695 forks source link

epoll_ctl(epoll_fd, EPOLL_CTL_ADD, fd, &ev): Bad file descriptor error caught in a multithreaded application #43

Open nabilasalmi opened 7 years ago

nabilasalmi commented 7 years ago

Hi,

I had an error on my application following the "Getting started example", implementing a Rest Server in a multithreaded application. Sometimes, an exception is caught . It's description is:

epoll_ctl(epoll_fd, EPOLL_CTL_ADD, fd, &ev): Bad file descriptor.

What lets this error occur ??

Thanks a lot for your help !

Nabila

mic90 commented 7 years ago

Hi,

Just went here to look for answer to similar problem. Here is sample code to reproduce:

#include <iostream>
#include <atomic>
#include <endpoint.h>
#include <signal.h>

using namespace std;
using namespace Net;

bool stop(false);

void signalHandler(int signal)
{
    cout << "Signal: " << signal << endl;
    stop = true;
}

class HelloHandler : public Http::Handler
{
public:

    HTTP_PROTOTYPE(HelloHandler)

    void onRequest(const Http::Request& request, Http::ResponseWriter response)
    {
        cout << "Received request " << request.resource() << endl;
        response.send(Http::Code::Ok, "Hello, World");
    }
};

int main()
{
    signal(SIGINT, signalHandler);
    signal(SIGILL, signalHandler);
    signal(SIGKILL, signalHandler);

    cout << "Starting rest server" << endl;
    Net::Address addr(Net::Ipv4::any(), Net::Port(9080));

    auto opts = Http::Endpoint::options().threads(1);
    Http::Endpoint server(addr);
    server.init(opts);
    server.setHandler(std::make_shared<HelloHandler>());
    server.serveThreaded();
    cout << "Server started" << endl;
    while(!stop)
    {
    }
    server.shutdown();
    cout << "Exiting" << endl;
    return 0;
}

If application is started and no request were made - I can close the application and start it again without problems. If application is started and one or more requests were made - when I close the application and immediately start it again I got the same error as Nabila:

terminate called after throwing an instance of 'std::runtime_error'
  what():  epoll_ctl(epoll_fd, EPOLL_CTL_ADD, fd, &ev): Bad file descriptor
Aborted (core dumped)

Given error pops out for about a minute, and then I can start application without problems again.

Best regards, Michal

DrLex0 commented 7 years ago

I also have this problem, but it is intermittent. When it happens, it takes exactly 60 seconds before the bad FD goes away. Seems like a race condition that sometimes causes the program to die before all FDs are closed.

oktal commented 7 years ago

Hello,

My guess is that the FD is not correctly closed when you press C-c. When that happens, the fd stays in TIME_WAIT state for a couple of seconds meaning that it can be be bound again unless the kernel recycles it (that would explain the 60 seconds delay).

Could you please try setting the special Tcp::Options::InstallSignalHandler flag to your endpoint like so:

opts.flags(Tcp::Options::InstallSignalHandler);

This should install a signal handler that will properly close the fd before exiting your program.

mic90 commented 7 years ago

Hello,

Sorry for my late response, I have tried to include your propositions in my sample code, but the error is still present.

The code changes I have made: [listener.cc]

    void closeListener() {
        if (g_listen_fd != -1) {
            ::close(g_listen_fd);
            g_listen_fd = -1;
        }
        std::raise(SIGTERM); //ADDED
    }

[main.cpp]

int main()
{
    signal(SIGTERM, signalHandler);

    cout << "Starting rest server" << endl;
    Net::Address addr(Net::Ipv4::any(), Net::Port(9080));

    auto opts = Http::Endpoint::options().threads(1).flags(Tcp::Options::InstallSignalHandler);
    Http::Endpoint server(addr);
    server.init(opts);
    server.setHandler(std::make_shared<HelloHandler>());
    server.serveThreaded();
    cout << "Server started" << endl;
    while(!stop)
    {
    }
    cout << "Exiting" << endl;
    server.shutdown();
    return 0;
}

So when user press ctrl+c, the sigterm signal is emitted from listener.cc closeListener() function. The signal is then caught by my signal handler which closes the application. Not sure if this is what You mean ?

Best regards, Michal

GeorgeKT commented 7 years ago

Lets see, the bug reporter doesn't set the reuse address option. So if you restart the program fast enough, all binds will fail, because of the previous run of the program (the kernel does not immediatly free the port)

If all bind calls fail, pistache keeps on going, and then adds a closed file descriptor to epoll, which obviously fails, the problem is here:

int fd = -1;
for (struct addrinfo *addr = addrs; addr; addr = addr->ai_next) {
    fd = ::socket(addr->ai_family, addr->ai_socktype, addr->ai_protocol);
    if (fd < 0) continue;

    setSocketOptions(fd, options_);

    if (::bind(fd, addr->ai_addr, addr->ai_addrlen) < 0) {
        close(fd);
        continue;
    }

    TRY(::listen(fd, backlog_));
    break;
}

make_non_blocking(fd);
poller.addFd(fd, Polling::NotifyOn::Read, Polling::Tag(fd));
listen_fd = fd;
g_listen_fd = fd;

The possibility that all bind calls fail, is not taken into account.

Further more if listen fails, the code will leak a socket. There is also a memory leak, because it is calling getaddrinfo without calling freeaddrinfo on the returned addresses.

Also if setSocketOptions fails, it will throw an exception, so that is another potential file descriptor leak.

If you are going to use exceptions as your error handling strategy, you need to properly use RAII, so you don't leak stuff. So the first thing you ought to do is wrap socket's and file descriptors in objects which close them in their destructors.

JMare commented 6 years ago

For anyone else struggling with this, setting SO_REUSE_ADDR has fixed this problem for me.

xoac commented 6 years ago

Better solution is Tcp::Options::ReuseAddr:

    auto opts = Http::Endpoint::options()
        .threads(thr)
        .flags(Tcp::Options::InstallSignalHandler | Tcp::Options::ReuseAddr);

But remember to use | see #137

Santhosh-KS commented 6 years ago

Solution to add .flags(Tcp::Options::InstallSignalHandler | Tcp::Options::ReuseAddr);

has not solved the problem. By adding this piece of code, helloworld application will never receive the signal when user does Ctrl+c. and you have to do a ps kill or close the terminal.

xoac commented 6 years ago

Hi, If you used Tcp::Options::InstallSignalHandler than ur CRTL + C is catched and don't kill the app. Tcp::Options::ReuseAddr -- works for me. I can kill app and start again. System allow me to REUSE port.

nabilasalmi commented 6 years ago

Hi,

Thank you, i'll try that !! Best Regards, Nabila

2018-01-09 13:53 GMT+01:00 Sylwek notifications@github.com:

Hi, If you used Tcp::Options::InstallSignalHandler than ur CRTL + C is catched and don't kill the app. Tcp::Options::ReuseAddr -- works for me. I can kill app and start again. System allow me to REUSE port.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/oktal/pistache/issues/43#issuecomment-356275688, or mute the thread https://github.com/notifications/unsubscribe-auth/AX3ncVCArWgm8e_8U0m39hXdIUBbIlQWks5tI2EugaJpZM4LshSm .

jeffvandyke commented 6 years ago

@Santhosh-KS, as a note on this issue, my pull request #233 is supposed to fix InstallSignalHandler not closing the program.

hoang408 commented 6 years ago

@xoac 's solution works for me. Thank you so much!

jacksu commented 5 years ago

I have the same problem.