Closed aseering closed 7 years ago
This isn't blocking me from doing anything since affected services continue to work just fine. But I imagine that, come summer, I will get tired of my laptop acting as a space heater :-)
It spins even without the curl step, after fd 11 is closed and removed from the epoll set (your curl is fd 12). Wonder what the trigger is. If epoll_wait()
didn't work, libuv (ie, all of node) should have spun a long time ago. Issue title can be changed to "epoll_wait() doesn't" for comedic value.
"title": sure, why not :-)
Here's specific point where the strace diverges from native:
19:37:55.683391 timerfd_settime(5, 0, {it_interval={0, 0}, it_value={299, 999841000}},
{it_interval={139797485122496, 139797360282104}, it_value={139797485122496, 5652500}}) = 0
19:37:55.683656 sendmsg(11, {msg_name(0)=NULL,
msg_iov(1)=[{"HTTP/1.1 200 OK\r\nContent-Type: a"..., 81}],
msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 81
19:37:55.684158 epoll_wait(4, [], 128, 0) = 0
19:37:55.684521 getpeername(11, {sa_family=AF_INET, sin_port=htons(56999),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
19:37:55.684843 getpeername(11, {sa_family=AF_INET, sin_port=htons(56999),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
19:37:55.685033 gettimeofday({1484710675, 685094}, NULL) = 0
19:37:55.685187 gettimeofday({1484710675, 685248}, NULL) = 0
19:37:55.685341 timerfd_settime(5, 0, {it_interval={0, 0}, it_value={4, 999846000}},
{it_interval={139797485122752, 139797360282104}, it_value={139797485122752, 5652500}}) = 0
19:37:55.685533 recvmsg(11, 0x7f252370efd0, 0) = -1 EAGAIN (Resource temporarily unavailable)
19:37:55.685683 epoll_wait(4, [], 128, 0) = 0
19:37:55.685821 epoll_wait(4, [{EPOLLIN, {u32=469764852, u64=139797360282356}}], 128, -1) = 1
19:38:00.686052 gettimeofday({1484710680, 686279}, NULL) = 0
19:38:00.686682 timerfd_settime(5, 0, {it_interval={0, 0}, it_value={300, 0}},
{it_interval={139797485123056, 5617832}, it_value={-4294967296, 139797360287960}}) = 0
19:38:00.687326 shutdown(11, SHUT_RDWR) = 0
19:38:00.687901 epoll_ctl(4, EPOLL_CTL_DEL, 11, 0x7f252370f8e0) = 0
19:38:00.688330 close(11) = 0
19:38:00.688841 epoll_wait(4, [{EPOLLIN, {u32=469764852, u64=139797360282356}}], 128, 0) = 1
That very last epoll_wait()
with last parameter (ie timeout) 0 should return 0 (ie empty) but it returns 1, and continues to return 1 forever. File descriptor 11 is the client socket and is the mostly likely culprit, but strangely a little test case worked. It isn't timerfd 5, which won't go off for 300 seconds. Good catch.
@aseering - Thanks for these gems
:). The problem is with the timerfd
(@therealkenc almost had a bulls eye
:)). When timerfd_settime
is called, it should actually reset the EPOLL, which was not happening. This bug was always present, but never got noticed (because of it's specific use-case). The CR is out for review.
The fix is in the dev branch and should soon hit the release build. Thanks @aseering and @therealkenc.
The reproducing steps for this issue are identical to those in #616 . However, that issue appears to have been fixed. Now that it has been fixed, this secondary effect has become visible.
This feels to me like
epoll_wait()
is not waiting. But I'll let y'all evaluate the use case.curl
should successfully download the file. After the file transfer completes,curl
should exit and thehttp_examples
process should consume negligible CPUcurl
does successfully download the file. (Yay! #616 has been fixed!)curl
does exit as expected upon a successful download **http_examples
does NOT decrease in CPU utilization after the download finishes. It continues to peg a CPU core. It continues to work just fine other than the excessive CPU consumption, but system performance (and battery life) overall is reduced somewhat 'cause random process pegging a CPU core.Applications that open multiple independent Boost::Asio sockets may peg more than one CPU core.
15007
Initial Setup -- run the following commands in WSL:
Steps to reproduce:
Open up two WSL console windows In one, do
cd Simple-Web-Server ; ./http_examples
. Give the server process a couple seconds to start up. In the other, docurl http://localhost:8080/big.dat >/dev/null
When
http_examples
is spinning after the download completes, it emits a continuous stream of the following:The following lines from the strace look relevant to the configuration of fd's 4 and 5, used above:
I can post the full strace if it would be useful; let me know. In particular, I haven't checked whether this behavior is sensitive to the configuration of the file descriptors that are being watched.
See our contributing instructions for assistance.