google / gvisor

Application Kernel for Containers
https://gvisor.dev
Apache License 2.0
15.53k stars 1.28k forks source link

tcpdump broken for libpcap 1.10+ #6664

Open crappycrypto opened 2 years ago

crappycrypto commented 2 years ago

Description

The gvisor site mentions that tcpdump is working in non-promiscous mode. However since libpcap 1.10.0 tcpdump seems to fail inside gvisor. My guess is that is because of the following entry in the changelog

Linux: Require PF_PACKET support, and kernel 2.6.27 or later A related issue is https://github.com/google/gvisor/issues/1409

Is this feature related to a specific bug?

No response

Do you have a specific solution in mind?

No response

hbhasker commented 2 years ago

Thanks for the report. Could you provide an strace log of runsc. Also could you provide your daemon.conf for the runsc runtime.

crappycrypto commented 2 years ago

Using a debian bullseye container it stops after just a few syscalls.

Tested with

socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC) = -1 EPROTONOSUPPORT (Protocol not supported)
socket(AF_UNIX, SOCK_RAW, 0)            = 3
ioctl(3, SIOCETHTOOL, 0x7f7d790a8700)   = -1 EOPNOTSUPP (Operation not supported)
close(3)                                = 0
eventfd2(0, EFD_NONBLOCK)               = 3
socket(AF_PACKET, SOCK_RAW, htons(0 /* ETH_P_??? */)) = -1 EPERM (Operation not permitted)
close(3)                                = 0

The first problem seems to be that creating a AF_SOCKET packet without specifying a protocol is not supported in gvisor. See https://github.com/google/gvisor/blob/108410638aa8480e82933870ba8279133f543d2b/pkg/sentry/socket/netstack/provider.go#L140

However the linux kernel support 0 as protocol as documented in https://github.com/torvalds/linux/blob/f40ddce8/Documentation/networking/packet_mmap.rst#L83

int fd = socket(PF_PACKET, mode, 0);

The protocol can optionally be 0 in case we only want to transmit via this socket, which avoids an expensive call to packet_rcv(). In this case, you also need to bind(2) the TX_RING with sll_protocol = 0 set. Otherwise, htons(ETH_P_ALL) or any other protocol, for example.

The libpcap code can be found at https://github.com/the-tcpdump-group/libpcap/blob/fa91341ab7647521c90b3e34c93026725bfb71dd/pcap-linux.c#L2312

I don't see any file named daemon.conf on my machine. Does that mean I use the defaults or am I just looking in the wrong location. Gvisor was installed using apt on debian if that helps.

Using libpcap 1.8.1 (debian buster) works fine, as it creates a socket with

socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL)) = 3
ioctl(3, SIOCBONDINFOQUERY, 0x7f4079fda900) = -1 ENOTTY (Inappropriate ioctl for device)
ioctl(3, SIOCGIWMODE, 0x7f4079fda940)   = -1 ENOTTY (Inappropriate ioctl for device)
close(3)                                = 0
socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL)) = 3
ioctl(3, SIOCGIFINDEX, {ifr_name="lo", }) = 0
ioctl(3, SIOCGIFHWADDR, {ifr_name="eth0", ifr_hwaddr={sa_family=ARPHRD_ETHER, sa_data=02:42:ac:11:00:02}}) = 0
stat("/sys/class/net/eth0/wireless", 0x7f4079fda650) = -1 ENOENT (No such file or directory)
ioctl(3, SIOCBONDINFOQUERY, 0x7f4079fda5b0) = -1 ENOTTY (Inappropriate ioctl for device)
ioctl(3, SIOCGIWNAME, 0x7f4079fda5f0)   = -1 ENOTTY (Inappropriate ioctl for device)
ioctl(3, SIOCGIFINDEX, {ifr_name="eth0", }) = 0
bind(3, {sa_family=AF_PACKET, sll_protocol=htons(ETH_P_ALL), sll_ifindex=if_nametoindex("eth0"), sll_hatype=ARPHRD_NETROM, sll_pkttype=PACKET_HOST, sll_halen=0}, 20) = 0
hbhasker commented 2 years ago

Thanks for the detailed report, let me take a look at it.

kevinGC commented 2 years ago

If you're seeing EPERM, make sure that both:

Passing 0 to create a write-only socket is not supported, but I believe netstack will let you create a packet socket with protocol 0.

crappycrypto commented 2 years ago

You're right I made a mistake with setting up a minimal repro environment for the bug. Tcpdump is indeed broken with the versions as described above but it crashes with a segfault. (debian bullseye container with tcpdump installed via apt)

Here's the strace failing to setup a ring buffer in libpcap

socket(AF_PACKET, SOCK_RAW, htons(0 /* ETH_P_??? */)) = 4
ioctl(4, SIOCGIFINDEX, {ifr_name="lo", }) = 0
ioctl(4, SIOCGIFHWADDR, {ifr_name="eth0", ifr_hwaddr={sa_family=ARPHRD_ETHER, sa_data=02:42:ac:11:00:02}}) = 0
stat("/sys/class/net/eth0/wireless", 0x7fa32e8455b0) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/sys/class/net/eth0/dsa/tagging", O_RDONLY) = -1 ENOENT (No such file or directory)
ioctl(4, SIOCGIFINDEX, {ifr_name="eth0", }) = 0
bind(4, {sa_family=AF_PACKET, sll_protocol=htons(0 /* ETH_P_??? */), sll_ifindex=if_nametoindex("eth0"), sll_hatype=ARPHRD_NETROM, sll_pkttype=PACKET_HOST, sll_halen=0}, 20) = 0
getsockopt(4, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
getsockopt(4, SOL_SOCKET, SO_BPF_EXTENSIONS, 0x7fa32e8456c0, [4]) = -1 ENOPROTOOPT (Protocol not available)
mmap(NULL, 266240, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f9b17f83000
getsockopt(4, SOL_PACKET, PACKET_HDRLEN, 0x7fa32e845620, [4]) = -1 ENOPROTOOPT (Protocol not available)
munmap(0x7f9b17f83000, 266240)          = 0
setsockopt(4, SOL_PACKET, PACKET_RX_RING, {tp_block_size=0, tp_block_nr=0, tp_frame_size=0, tp_frame_nr=0}, 16) = -1 ENOPROTOOPT (Protocol not available)

The code then crashes while trying to free oneshot_buffer in pcap_cleanup_linux (This is a bug in libpcap where the buffer is also free'd in the error path of setup_mmapped) (The code crashes at https://github.com/the-tcpdump-group/libpcap/blob/fa91341ab7647521c90b3e34c93026725bfb71dd/pcap-linux.c#L835 )

The real issue is that libpcap now requires a memory mapped ring buffer for receiving the packets. The code which checks support is in init_tpacket and it assumes that ENOPROTO means that the kernel is compiled without packet ring buffer support. See https://github.com/the-tcpdump-group/libpcap/blob/fa91341ab7647521c90b3e34c93026725bfb71dd/pcap-linux.c#L2752

Thus to support the newer libpcap versions support for TPACKET_V2 or TPACKET_V3 is needed.

hbhasker commented 2 years ago

Sigh. Looks like pcap removed all support for non-mmapped ring buffer in commit https://github.com/the-tcpdump-group/libpcap/commit/7c78bcb843283dec6357e70f60ce92a75a55d681 :-\

Let me open a bug to support TPACKET_V2. I think v2 is simpler to implement than v3.

hbhasker commented 2 years ago

I will update our documentation to indicate that we do not support libpcap 1.10+ and users should stick with libpcap1.9 or lower for now.

github-actions[bot] commented 12 months ago

A friendly reminder that this issue had no activity for 120 days.