Closed GoogleCodeExporter closed 9 years ago
I need to look at the code more, but it appears that the main control loop needs
work. It should handle running clients, dispatch new clients when an old client
finishes, and accept new connections when there is queue space. From the logs,
and
observed behavior, it is having problems when old clients leave before a test
is begun.
One correction. The Mlab nodes are configured to handle a maximum of 80 NDT
clients
not 100! The 1st 20 get served, the nest 60 get queued. The last 20 requests
should
get rejected with a 9988 'server busy' message.
Also, the server removes the defunct processes after a test completes. Some
time, up
to 30 seconds, may transpire between the time a child process terminates and
when the
kernel resources are released. This is not a bug, but a design issue. The
problem
is that the server's main processing loop is stalling and not that defunct
processes
are hanging around forever. (at least that's what I think right now. I'll
examine
the debug log files to see what's going on, probably the weekend of the 13th).
While I could change the code to queue 80 clients, I'm not sure it is
reasonable. It
looks like users aren't willing to wait 4-5 minutes before getting test
results, so
it may be better to reduce the queue depth and make them issue another test
request.
I wouldn't change the code now, but this is something we need to discuss.
Rich
Original comment by rcarlson...@gmail.com
on 5 Mar 2010 at 1:21
On the 80 vs. 100. correction
Just to clarify, for M-Lab, my memory is that we all discussed queuing 100
users (the
next 80 after allowing the first 20) and that's what I expected Rich would
configure
and subsequently QA to test. If it was set to 80, leave it. I think 80 is
fine. I
also agree most users won't wait 4-5 minutes.
Original comment by funcho...@gmail.com
on 5 Mar 2010 at 2:51
Original comment by jwzuraw...@gmail.com
on 10 Mar 2010 at 8:42
Re-verified with V3.6.3 and this bug is not yet fixed. Still defunct processes
are created and rarely some process are not exited. But the sever is responding
to the clients normal (previous server stop responding to clients)
Original comment by sekharn...@gmail.com
on 10 Jun 2010 at 5:19
Original issue reported on code.google.com by
Garimell...@gmail.com
on 3 Mar 2010 at 10:12Attachments: