pistacheio / pistache

A high-performance REST toolkit written in C++
https://pistacheio.github.io/pistache/
Apache License 2.0
3.12k stars 688 forks source link

Crashed with std::runtime_error - bad file descriptor #1157

Open tronghung279 opened 11 months ago

tronghung279 commented 11 months ago

My application use Pistache::Http::Endpoint::serveThreaded and got crashed with message terminate called after throwing an instance of 'std::runtime_error' what(): Bad file descriptor

I use 2 threads for pistache as the following

m_httpEndpoint = std::make_shared<Pistache::Http::Endpoint>(addr);
m_router = std::make_shared<Pistache::Rest::Router>();
m_router->addCustomHandler(Pistache::Rest::Routes::bind(&EventListenerServer::defaultRequestHandling, this));
auto opts = Pistache::Http::Endpoint::options().threads(2);
opts.flags(Pistache::Tcp::Options::ReuseAddr);
opts.maxPayload(32768);
m_httpEndpoint->init(opts);
Pistache::Rest::Routes::Post(*m_router, std::string("/" + m_path), Pistache::Rest::Routes::bind(&EventListenerServer::eventRequest, this));
m_httpEndpoint->setHandler(m_router->handler());
m_httpEndpoint->serveThreaded();

Coredump log

#0 0xa4adc3a0 in epoll_wait () from /home/docker/development/projects/coredump/libs/libc.so.6
`

1 0xa3d65de4 in Pistache::Polling::Epoll::poll(std::vector<Pistache::Polling::Event, std::allocator >&, std::chrono::duration<long long, std::ratio<1ll, 1000ll> >) const () from /home/docker/development/projects/coredump/libs/libpistache.so.0<br>

2 0xa3d82610 in Pistache::Tcp::Listener::run() () from /home/docker/development/projects/coredump/libs/libpistache.so.0<br>

3 0xa4c1a44c in execute_native_thread_routine () from /home/docker/development/projects/coredump/libs/libstdc++.so.6<br>

4 0xa52a6d94 in ?? () from /home/docker/development/projects/coredump/libs/libpthread.so.0<br>

5 0xa4adbf68 in ?? () from /home/docker/development/projects/coredump/libs/libc.so.6<br>

Backtrace stopped: previous frame identical to this frame (corrupt stack?)`

Take a look into Pistache::Tcp::Listener::run I see it throws a ServerError when handling EBADF, which then print the same crash message. Is it possible to not throw the exception? Or could you guide me how to catch and handle it effectively? I see it breaks the whole run loop on exception but I don't want it.

Thank you

kiplingw commented 11 months ago

Hey @tronghung279. What is your RLIMIT_NOFILE set to and how many logical cores does your machine have? Pistache scales file descriptors at around 6N + 5, where N is the number of threads.

tronghung279 commented 11 months ago

Hey @tronghung279. What is your RLIMIT_NOFILE set to and how many logical cores does your machine have? Pistache scales file descriptors at around 6N + 5, where N is the number of threads.

Thank you @kiplingw for quick response. My machine has 1 core and RLIMIT_NOFILE is 1024

nproc
1
ulimit -n
1024

kiplingw commented 11 months ago

Hmm, probably not file descriptor exhaustion then if only two service threads.

tronghung279 commented 11 months ago

Yes I think. I have no idea why it raises EBADF when accept the connection. :(. But should we handle the error instead of throwing?

kiplingw commented 11 months ago

Come to think of it, EBADF I don't think is generated on file descriptor exhaustion. It sounds like one is being used in an invalid state (e.g. after it's already been closed).

Are you using the binary package from the PPA? Can you show us your endpoint handler?

tronghung279 commented 11 months ago

Hi @kiplingw I'm using it in an embedded system so I build it from source at commit f5b780f. Unfortunately I'm not able to share source code. But I see another potential case inside _reactor.run(), I see that it might call to void Transport::handleIncoming(const std::shared_ptr<Peer>& peer) and here if recv returns EBADFD, it throws std::runtime_error(strerror(errno)).. The error is never handled. I think it should handle disconnection in this case

kiplingw commented 11 months ago

What distro are you running on the embedded system?

tronghung279 commented 11 months ago

Hi @kiplingw We use Yocto (with some customizations).

kiplingw commented 11 months ago

Hmm in that case you will have to build from source, as you did. My guess is you might be doing something else that's corrupting memory.

tronghung279 commented 11 months ago

Not yet figure out the reason but when I use thread Pistache::Http::Endpoint::options().threads(1); the issue doesn't happen. :D. Do you have any idea?

kiplingw commented 11 months ago

I'm not an strace(1) expert, but I suspect @dennisjenkins75 might give some suggestions on using it to figure out which system call is creating the EBADFD.