the-tcpdump-group / tcpdump

the TCPdump network dissector
https://www.tcpdump.org/
Other
2.72k stars 849 forks source link

Packet dumps get corrupt by the USR2 signal when there are outstanding packets in the input OS buffer. #1231

Open garrymar opened 1 week ago

garrymar commented 1 week ago

I noticed that the file dump gets corrupt when the USR2 signal is sent to tcpdump instances having packets in the input OS buffer.

To replicate the problem in Linux environment with the latest kernel, tcpdump, and libpcap versions, one can go over the following procedure. For simplicity sake, root access is assumed in the examples.

# Create a slow storage, limited to 2 write IOPS (input/output operations per second)

# Create 1 x 1GB RAM-backed block device 
modprobe brd rd_nr=1 rd_size=1024000

# Format the newly created block device
mkfs.ext4 /dev/ram0

# Mount the device in sync mode
mkdir /mnt/test
mount -o sync /dev/ram0 /mnt/test/

# Check the major and minor IDs of the device
lsblk /dev/ram0
NAME MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
ram0   1:0    0 1000M  0 disk /mnt/test

# Create a new subtree in cgroup-v2 SYSFS tree
mkdir /sys/fs/cgroup/test

# Enable the IO controller
echo "+io" > /sys/fs/cgroup/cgroup.subtree_control

# Apply write IOPS restrictions (2 IOPS) to the previously created device
# using the major and minor IDs
echo "1:0 wiops=2" > /sys/fs/cgroup/test/io.max

cat /sys/fs/cgroup/test/io.max
1:0 rbps=max wbps=max riops=max wiops=2

# Bind the current shell session with the cgroup
echo $$ > /sys/fs/cgroup/test/cgroup.procs
grep $$ /sys/fs/cgroup/test/cgroup.procs
3154

# Start tcpdump listening on the loopback device
# and writing to the slow storage
tcpdump -i lo -w /mnt/test/icmp.pcap -c 100 icmp

# Open a second terminal and start sending SIGUSR1 signal to the
# tcpdump process every second to monitor its usage of the input buffer
watch -n1 killall -USR1 tcpdump

# Open a third terminal and send 50 full-MTU-sized ICMP requests at 1kps
# to the loopback interface
ping -i 0.001 -c 50 -q -s 1472 127.0.0.1
PING 127.0.0.1 (127.0.0.1) 1472(1500) bytes of data.

--- 127.0.0.1 ping statistics ---
50 packets transmitted, 50 received, 0% packet loss, time 49ms

# While monitoring the first terminal, identify the moment when the input buffer
# is in use: packets captured fewer than 100 while packets received by filter is 200
...
tcpdump: 0 packets captured, 0 packets received by filter, 0 packets dropped by kernel
tcpdump: 3 packets captured, 200 packets received by filter, 0 packets dropped by kernel
tcpdump: 9 packets captured, 200 packets received by filter, 0 packets dropped by kernel
tcpdump: 14 packets captured, 200 packets received by filter, 0 packets dropped by kernel
tcpdump: 19 packets captured, 200 packets received by filter, 0 packets dropped by kernel
tcpdump: 25 packets captured, 200 packets received by filter, 0 packets dropped by kernel
tcpdump: 27 packets captured, 200 packets received by filter, 0 packets dropped by kernel
tcpdump: 33 packets captured, 200 packets received by filter, 0 packets dropped by kernel
tcpdump: 38 packets captured, 200 packets received by filter, 0 packets dropped by kernel
tcpdump: 43 packets captured, 200 packets received by filter, 0 packets dropped by kernel
tcpdump: 49 packets captured, 200 packets received by filter, 0 packets dropped by kernel
<<< good moment >>>

# Send SIGUSR2 signal to the running tcpdump instance from the third terminal
killall -USR2 tcpdump

# Wait until tcpdump captures all 100 packets and quits
...
tcpdump: 86 packets captured, 200 packets received by filter, 0 packets dropped by kernel
tcpdump: 92 packets captured, 200 packets received by filter, 0 packets dropped by kernel
tcpdump: 97 packets captured, 200 packets received by filter, 0 packets dropped by kernel
100 packets captured
200 packets received by filter
0 packets dropped by kernel

# Inspect the dump. It should show signs of corruption
# tcpdump -r /mnt/test/icmp.pcap
reading from file /mnt/test/icmp.pcap, link-type EN10MB (Ethernet), snapshot length 262144
18:41:04.218358 IP localhost > localhost: ICMP echo request, id 23, seq 1, length 1480
18:41:04.218383 IP localhost > localhost: ICMP echo reply, id 23, seq 1, length 1480
...
tcpdump: pcap_loop: invalid packet capture length 3216948668, bigger than snaplen of 262144

More interesting things are happening to UDP packets: after sending the USR2 signal, the remaining packets get dumped without first 4 bytes:

# tcpdump -#r /mnt/test/udp.pcap
reading from file /mnt/test/udp.pcap, link-type EN10MB (Ethernet), snapshot length 262144
    1  16:51:05.268417 IP 192.168.0.2.42910 > 192.168.0.1.italk: UDP, length 1472
    2  16:51:05.268453 IP 192.168.0.2.42910 > 192.168.0.1.italk: UDP, length 1472
    3  16:51:05.268462 IP 192.168.0.2.42910 > 192.168.0.1.italk: UDP, length 1472
    ...
   45  16:51:05.268918 IP 192.168.0.2.42910 > 192.168.0.1.italk: UDP, length 1472
   46  16:51:05.268929 IP 192.168.0.2.42910 > 192.168.0.1.italk: UDP, length 1472
   47  [Invalid header: caplen==0, len==0]
   48  [Invalid header: caplen==0, len==0]
   49  [Invalid header: caplen==0, len==0]
   ...
  111  [Invalid header: caplen==0, len(1729435865) > 262144]
  112  [Invalid header: len(1777351418) > 262144]
  113  [Invalid header: len(1777351418) > 262144]
  114  [Invalid header: len(1777351418) > 262144]
  ...

Both ICMP and UDP dumps are attached. Below are the version details:

# uname -r
6.11.3-arch1-1

# tcpdump --version
tcpdump version 4.99.5
libpcap version 1.10.5 (with TPACKET_V3)
OpenSSL 3.3.2 3 Sep 2024
64-bit build, 64-bit time_t

Thank you.

Regards, Garri

icmp-udp.tar.gz

guyharris commented 1 week ago

It's calling pcap_dump_flush() from a signal handler, which means it could happen in the middle of a call to pcap_dump() and completely mess things up out from under pcap_dump().

It should not do that.