the-tcpdump-group / tcpdump

the TCPdump network dissector
https://www.tcpdump.org/
Other
2.74k stars 851 forks source link

Support eBPF live filtering on Linux #1234

Open ferrieux opened 1 month ago

ferrieux commented 1 month ago

On modern Linux, eBPF is available to do advanced filtering. In some circumstances, the extra muscle brought by filtering early and in-kernel in zero-copy mode is critical. One such use case is filtering of live a interface with a haystack of traffic and a needle matching a complex eBPF-expressed criterion. As a simple example that currently has no efficient solution with tcpdump, is matching an IP address against a large hashtable (like is done in netfilter with ipset or nftables sets)

It would be cool, for this case, to be able to tell tcpdump to load a separately compiled eBPF object file, and attach it to its raw socket with SO_ATTACH_BPF.

Note I'm not advocating for deeper integration, like eBPF-filtering a pcap file, since that has none of the performance requirements above, and arbitrarily complex logic in full-fledged C in userspace can be used instead.

I'm also not proposing (alas) to replicate the beautifully self-contained inline cBPF; in eBPF the heavier ELF machinery makes it unlikely anybody would want to enter a program as a list of comma-separated integers (though I, for one, would love to do it for "return XDP_DROP")

I guess the implementation is straightforward (for sufficiently recent Linux). The only thing that needs thinking I guess is the command-line API. Among possibilities:

infrastation commented 4 weeks ago

For posterity, this originates from the-tcpdump-group/libpcap#1379.

From the "it would be cool" point of view, it would make sense to design such a new feature in a way to allow using cbpf-savefile(5) as a pre-compiled filter as well. In this case the pre-compiled bytecode would have to go to libpcap for the usual processing, including passing it to the kernel on certain OSes (what could possibly go wrong...).

From the "how to keep this feature working long-term?" point of view, such a feature would require some amount of work for the initial implementation and documentation, and then some recurring amount of work to support and maintain it. If the feature becomes popular, the maintenance will require notable amounts of skills and time because the intended audience would be network developers that work on advanced use cases and therefore create or run into advanced problems. Or because the VLAN headers will migrate around. Or because year 2038 will come faster than expected, etc.

Ideally there should be a developer that has a sound use case for this (at least remotely) and the time to get it done properly. Until then it would help at least to document a definition of done. For example, what would be the usual diagnostic steps to tell if a problem belongs to the loaded eBPF bytecode or elsewhere? What statistics would an eBPF filter return?

ferrieux commented 4 weeks ago

I think you're overdoing it in trying to "bring in" a chunk of responsibility that currently belongs to the Linux kernel.

The eBPF system's (or program's) "problems" are not tcpdump's responsibility, which ends at setsockopt(SO_ATTACH_BPF). If, in writing the eBPF program, one makes a mistake that keeps/drops packets it shouldn't, then so be it. That's no different from saying in cBPF "ether[0xc:2]==800" instead of ether[0xc:2]==0x800" and being sad about the result.

IMO, agreeing to this Yalta is reasonable, as it adds very little work on the tcpdump side, and none on the kernel side, as the setsockopt() is just being used as documented.

infrastation commented 4 weeks ago

Power and responsibility normally should be in balance. Let's look for ways to keep this potential feature sustainable. If something does not work as expected, it would be nice to have an easy way for the user to tell — without opening a support ticket — that the problem certainly is at the kernel end of the socket and ideally a sense of the next place to check. Perhaps if you look through the existing open BPF-related issues in libpcap, it will make more sense.

ferrieux commented 4 weeks ago

Well, there is an easy way: just launch two instances of tcpdump in parallel, one with the filter, the other without.