Open GoogleCodeExporter opened 9 years ago
A bit of snooping on the freebsd developer channel suggested that the tcp
timewait
zone is being overflowed.
Indeed:
squid-1# vmstat -z | head -1 ; vmstat -z | grep -i tcptw
ITEM SIZE LIMIT USED FREE REQUESTS FAILURES
tcptw: 88, 8232, 873, 7359, 118015778, 5479
The suggestion is to bump "net.inet.tcp.maxtcptw" from what it is (8191 at the
moment) to something higher and see if the issue goes away.
Original comment by adrian.c...@gmail.com
on 5 Jul 2009 at 2:34
Its still occuring.
Its not listen queue overflows:
squid-1# netstat -sp tcp | grep -i listen
0 listen queue overflows
Original comment by adrian.c...@gmail.com
on 5 Jul 2009 at 2:37
From a diff of netstat -sp tcp after 10 seconds; this is under "packets
received":
- 26350808 discarded due to memory problems
+ 26351094 discarded due to memory problems
Let's track down that particular counter in the tcp/ip statistics code and see
what
exactly is responsible for this counter.
Original comment by adrian.c...@gmail.com
on 5 Jul 2009 at 2:46
That counter is part of the TCP reassembly code.
Check:
squid-1# sysctl net.inet.tcp.reass
net.inet.tcp.reass.overflows: 26398434
net.inet.tcp.reass.maxqlen: 48
net.inet.tcp.reass.cursegments: 267
net.inet.tcp.reass.maxsegments: 16384
net.inet.tcp.reass.overflows has been steadily rising. maxqlen has been bumped
to 256
with no (current) adverse effect, but I wonder what else needs to be bumped.
What
about nmbclusters?
In any case, this still hasn't helped with ECONNABORTED.
Original comment by adrian.c...@gmail.com
on 5 Jul 2009 at 3:08
Something to look at tomorrow morning.
comm_call_handlers() will call the read handler if read_event is 1 (under the
right circumstances) but what
about if its -1?
do_check_incoming() is invoked in various places, where it calls
do_call_incoming() which calls
comm_call_handlers(fd, -1, -1). This means that accept() is going to be
attempted a -whole lot- of times
even if it isn't currently isn't flagged to be checked. How valid is this
exactly? In theory, accept() should just
return a shiny non-fatal error if no FDs are ready but is this -truely- going
to be the case here with Squid?
Original comment by adrian.c...@gmail.com
on 5 Jul 2009 at 7:55
Also, grovelling around the kernel code has provided some potential gems.
There's only a few places where ECONNABORTED is returned.
http://fxr.watson.org/fxr/ident?v=FREEBSD7;im=excerpts;i=ECONNABORTED
It may be worthwhile just hacking the kernel up to have printf()'s in the
places where this value is set and just
run the proxy in production for a few minutes to see which code paths lead to
connections being aborted like
this.
The way it is bursty makes me wonder exactly what the root cause of the issue
is most likely to be. It could be
a local resource starvation issue on the box. It could be something upstream
(eg NAT gateway?) getting
wholly upset with the session counts?
Another thing I've been pondering is given the server is also spoofing client
IPs as well as server-side IPs, are
there any PCB hash collisions? That certainly needs to be investigated.
Original comment by adrian.c...@gmail.com
on 5 Jul 2009 at 8:31
Original issue reported on code.google.com by
adrian.c...@gmail.com
on 5 Jul 2009 at 2:32