microsoft / ebpf-for-windows

eBPF implementation that runs on top of Windows
MIT License
2.94k stars 240 forks source link

Expose TCP statistics helpers (bpf_tcp_sock() helper) #335

Open dthaler opened 3 years ago

dthaler commented 3 years ago

RFC 4898 defines TCP Extended Statistics, which Windows has supported since Vista as exposed in via GetPerTcpConnectionEStats() and GetPerTcp6ConnectionEStats().

Some iphlpapi functionality is exposed in kernel mode netioapi.h as discussed at https://docs.microsoft.com/en-us/windows-hardware/drivers/network/ip-helper but currently this is not exposed.

Windows also supports a socket ioctl SIO_TCP_INFO with basic TCP stats but they are only available to a process with the socket handle.

Does Linux support TCP EStats or TCP basic stats?

dthaler commented 3 years ago

@qmonnet Can you verify whether Linux has any such helpers today? None that we know of...

qmonnet commented 3 years ago

After double-checking, there is in fact bpf_tcp_sock() that gives access to a lot of information, for programs attached to sockets (BPF_PROG_TYPE_CGROUP_SOCK) and TC (classifiers and actions, according to commit log). See the available fields at https://elixir.bootlin.com/linux/v5.13/source/include/uapi/linux/bpf.h#L5199. The description on the man page is succinct, here is the commit adding it. There are a few examples in the selftests, see tools/testing/selftests/bpf/progs/test_sock_fields.c.

dthaler commented 3 years ago

@qmonnet The man page contains no information about what bpf_tcp_sock returns and the other links are GPL'ed.

qmonnet commented 3 years ago

The helper returns a pointer to a struct bpf_tcp_sock object, describe in the first link. The file is GPL, but note that this is a user API header (no code, just definitions for user applications - even if they're not GPL) so I don't know what restrictions apply. Anyway, here is a list of the comments for the different integers in this struct.

Description of the fields in struct bpf_tcp_sock ``` - Sending congestion window - Smoothed round trip time - Minimual RTT allowed - Slow start size threshold - What we want to receive next - Next sequence we send - First byte we want an ack for - Cached effective mss, not including SACKS - ECN status bits - saved rate sample: packets delivered - saved rate sample: time elapsed - Packets which are "in flight" - Retransmitted packets out - Total retransmits for entire connection - RFC4898 tcpEStatsPerfSegsIn total number of segments in - RFC4898 tcpEStatsPerfDataSegsIn total number of data segments in - RFC4898 tcpEStatsPerfSegsOut The total number of segments sent - RFC4898 tcpEStatsPerfDataSegsOut total number of data segments sent - Lost packets - SACK'd packets - RFC4898 tcpEStatsAppHCThruOctetsReceived sum(delta(rcv_nxt)), or how many bytes were acked - RFC4898 tcpEStatsAppHCThruOctetsAcked sum(delta(snd_una)), or how many bytes were acked - RFC4898 tcpEStatsStackDSACKDups total number of DSACK blocks received - Total data packets delivered incl. rexmits - Like the above but only ECE marked packets - Number of unrecovered [RTO] timeouts ```

The commit log further explains that the helper is made available for cgskb and sched(cls|act) program types. It also says that if the context passed to the helper is not a pointer to a tcp_sock, or if it cannot be traced back to a tcp_sock, the helper returns NULL. Hence, the caller needs to check for NULL before accessing the struct (which is read-only).

dthaler commented 3 years ago

@qmonnet Thanks, very helpful. I see both basic stats and ESTATS in the list. Does Linux track ESTATS by default or do you have to turn them on? (For comparison, Windows tracks the basic stats by default in https://docs.microsoft.com/en-us/windows/win32/api/mstcpip/ns-mstcpip-tcp_info_v0 but ESTATS tracking is off by default, enabled via a socket option.)

qmonnet commented 3 years ago

Not sure, but I don't think so: Looking at some kernel code or at the examples, I don't see anything that makes me think you'd have to activate something in particular. Nor do I find any relevant socket option.

dthaler commented 3 years ago

this needs to be in a separate extension so it can call private APIs in Windows, so this is blocked in #345

dthaler commented 2 years ago

Blocked on issue #733