Closed ydp closed 3 years ago
Hey, thank you! Regarding your drops above, I think they are perhaps not random (though its hard to be certain without a bit more information). The drops you note at tcp_v4_do_rcv and tcp_v4_rcv pretty clearly correlate to drops resulting from checksum errors. If you disassemble those functions, you'll see offset 0x70 in tcp_v4_do_rcv, is the csum_err label where that code incremetns the TCP_MIB_CSUMERRORS counter, and a simmilar location/operation in tcp_v4_rcv. The sk_stream_kill_queues function is a bit trickier though. sk_stream_kill_queues is the function called when a tcp connection is closed. My guess (emphasis on guess here), is that after these packets with checksum errors are received, something aborts the connection, and the server closes the socket. As a result, any data in the sockets receive queue are flushed (via inet_csk_destroy_sock->sk_stream_kill_queues), and thats whats triggering those extra drop events.
Hi @nhorman Thanks for the explanation, I checked the tcp_v4_do_rcv
and tcp_v4_rcv
, it's indeed checksum error code, but why the drop packet count does not match what I captured in tcpdump with checksum error and the count increased on InCsumErrors, both are 11, but in dropWatch I did not observe those 11 packets, only 2 or 3 tcp_v4_do_rcv
and tcp_v4_rcv
, and sk_stream_kill_queues
. The reason I though it's random is because even if I did not issue the curl request and I did not see InCsumErrors increase, those drop packet still shows up, that's why I thought it's not related to the packet checksum error when the curl request was sent.
I expect the reason is because of how tcp does segmentation and reassembly. If you look at tcp_add_backlog, there is an increment of th incsumerrors counter as well, but the frame isn't dropped. Thats because those frames are in the reassembly queue, and tcp drops those when closing the socket using sk_stream_kill_queues. Thats why you see the requiside number of MIB errors, but the drop locations don't match up. Its just an artifact of how the tcp state machine is implemented.
@nhorman thank you so much for the detail explanation, that makes more sense.
Hi expert, First, thanks for this great tool, it's really awesome. I was investigating a server side checksum error issue, each time I curl the server endpoint, the
InCsumErrors
increase exactly the number (11
) that I could observe from client side, the packets(11 packets
) are retransmitted. But when I use dropWatch to monitor the process (with bothhw
andsw
set to true), not related drop reported, only below random drop shows up, which looks not like what I expected the 11 packets drop of checksum error. BTW, this is a https request and the server side is a vm on ESXi host.Could you please help? Is there any flag I should turn on or dropWatch cannot detect this kind of error? Thanks a lot!