google / gopacket

Provides packet processing capabilities for Go
BSD 3-Clause "New" or "Revised" License
6.32k stars 1.13k forks source link

ZeroCopyReadPacketData of TPacket panic after calling Close in other goroutine #743

Open kery opened 4 years ago

kery commented 4 years ago

(TPacket).ZeroCopyReadPacketData is called in a goroutine and it blocking that goroutine. I want to cancel that goroutine by calling (TPacket).Close in other gorouting, but panic occurred. To align with golang convention, it should be safe to call (*TPacket).Close method in other goroutine, and the calling of ZeroCopyReadPacketData should return after the calling of Close.

goroutine 11 [running]: runtime.throw(0x930fed, 0x5) /snap/go/4901/src/runtime/panic.go:774 +0x72 fp=0xc000173c48 sp=0xc000173c18 pc=0x42ebd2 runtime.sigpanic() /snap/go/4901/src/runtime/signal_unix.go:401 +0x3de fp=0xc000173c78 sp=0xc000173c48 pc=0x4436ae github.com/google/gopacket/afpacket.(v2header).getStatus(0x7f9b10000000, 0x1) /home/kery/go/src/github.com/google/gopacket/afpacket/header.go:114 +0x5 fp=0xc000173c80 sp=0xc000173c78 pc=0x754f85 github.com/google/gopacket/afpacket.(TPacket).pollForFirstPacket(0xc00018c900, 0x9ed360, 0x7f9b10000000, 0x475c4c, 0xc0000a8040) /home/kery/go/src/github.com/google/gopacket/afpacket/afpacket.go:453 +0x5e fp=0xc000173cd8 sp=0xc000173c80 pc=0x754cce github.com/google/gopacket/afpacket.(TPacket).ZeroCopyReadPacketData(0xc00018c900, 0xc000000006, 0x93c52e, 0xc, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /home/kery/go/src/github.com/google/gopacket/afpacket/afpacket.go:302 +0x136 fp=0xc000173d70 sp=0xc000173cd8 pc=0x754526 sim-data/tunnel.(GtpuConn).Read(0xc000176b10, 0xc0001ec000, 0x5ec, 0x5ec, 0x0, 0x0, 0x0)

caseybarker commented 4 years ago

I have this issue, too. The problem is that, when Close() is called asynchronously, the pollForFirstPacket() loop returns from unix.Poll(), passes all the error conditions, and loops around for another try. But by that point, the ring buffer has already been unmapped when getStatus() tries to examine the header.

A quick fix would be to also check for unix.POLLHUP|unix.POLLNVAL in the result of the poll and return an error. This would exit the loop early and probably work most of the time. However, this is still not completely safe because the ring buffer is referenced in several places, and an async call to Close() could invalidate it at any time.

There's a mutex in TPacket, but that mutex is locked for the duration of ZeroCopyReadPacketData(). Hence, Close() can't just block on it, or it might never proceed.

I believe a combination of fixes are needed. Unfortunately, I'm not seeing much progress on this package. I'll tag this issue if I send a PR.