iovisor / bcc

BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more
Apache License 2.0
20.27k stars 3.84k forks source link

helper for memmem() #471

Open vavrusa opened 8 years ago

vavrusa commented 8 years ago

Protocol such as DNS or IPv6 are hard to filter with cBPF because important parts are relative to floating offset, since BPF disallows loops and has relatively narrow registers, we have to unroll them and scan message with a lot of branches, burning the instruction limit.

Proposal

Provide kernel helper such as memmem(off_from, off_to, pattern) where offsets (R0, R1) are relative to packet payload, and pattern (R2) is a b/h/w/dw, and returned value would be offset in packet where the match occured or packet length (no occurence).

4ast commented 8 years ago

so 1,2,4,8 is a string search or aligned 8-byte search? if string search, then why stop as these sizes? what's the actual use case? what is being searched in the packet payload? ipv4 can also be variable length header...

vavrusa commented 8 years ago

Actual use case for me is DNS. The message has 12 octets of header, and then query name with variable size (127 possible labels, where label is { u8 length, u8 label[length] }, terminated by zero byte), query class and type. A common filtering thing is to look at type/class combination, but requires traversing potential 127 labels of name, otherwise it has false positives, and name suffix match - that also requires traversing all labels. The complexity doubles if you want to scan another record in the packet.

The proposal was to search for 1-8 octet wide string that would be placed directly in register, but variable string search would be even better. It would probably need to use stack for assembling the pattern and then two registers for offset/length on stack. I'm not sure if this is too complicated helper to satisfy runtime length constraints (it still preserves DAG filter property though).

4ast commented 8 years ago

Passing the string as u64 vs pointer on a stack is a noise comparing to the cost of string search. If we do a helper it should probably be glibc-like: bpf_search_skb(start_off, len, &buf, sizeof(buf)) but then will single string be enough? should we do dfa/aho-corasick instead? Then people can be build real IPS all in the kernel.

vavrusa commented 8 years ago

Where buf corresponds to searched pin string? That looks good. I think having a simple string scan primitive already solves several problems:

You can already search a set of keys by sharing the array of keys over eBPF maps from userspace, and install several eBPF filters (or tail call them), at a cost of course. Doing Aho-Corasick, BWA or any substring/prefix/suffix match would be great, but doesn't directly compete with simple string search (cost of building dags/suffix trees for a single lookup is different from Boyer-Moore or KMP), and would require another map type.

4ast commented 8 years ago

if it was a normal C then yes. All 4 cases would apply, but for bpf we'd need separate helper for search_in_skb() vs search_in_stack_buf() vs search_in_unsafe_mem(). First will be bounded by packet size, 2nd by stack and 3rd would need to use non-faulting loads. For DNS use case only 1st version is needed, right?

vavrusa commented 8 years ago

Yes, I meant - 4 cases but with the skb being haystack only, not generic memory. 2nd and 3rd are not that interesting, at least not until we start to compute over elements in map as well.