google / buzzer

Apache License 2.0
429 stars 32 forks source link

Integrate with syzkaller #19

Open dvyukov opened 1 year ago

dvyukov commented 1 year ago

syzkaller is a coverage-guided OS kernel fuzzer. It can generate BPF programs of low quality and could benefit from a high-quality BPF generator. It won't be as efficient as Buzzer in stressing BPF subsystem itself, but will uncover bugs in more complex interactions between BPF programs and other kernel subsystems.

I have a prototype for integration, which you can mostly ignore except for the proposed interface between syzkaller and Buzzer:

// This is supposed to be buzzer.Generate.
// progType is the program type (BPF_PROG_TYPE_SOCKET_FILTER/KPROBE/...).
// oldInsns is the program for mutation, if empty, generate a new one.
// Returns new/mutated program and number of used map fd's in bpf_attr::fd_array.
func BuzzerGenerate(progType int, rnd *rand.Rand, oldInsns []uint64) (insns []uint64, maps int)

Buzzer and syzkaller has some model mismatch as syzkaller generates and mutates programs offline. So it's not possible to embed actual map fd's into the program (they don't exist yet, and we are not even running on the target machine). So I restricted it to use of only bpf_attr::fd_array, which syzkaller will fill with requested number of fd's later. It is possible to use fd's embed in the program, but it will require some for of a-la ELF relocations (buzzer will need to say what offsets in the program should contain fd's and which of these refer to the same/different maps). I decided to left this out for now.

Any other ideas on how to design the interface? Anything I've missed? Do we need attach type here? Or map types? Or anything to make it possible to call C functions from the BPF program?

CC @a-nogikh @tarasmadan

thatjiaozi commented 1 year ago

Hey @dvyukov

Thanks for opening this, let's pursue this idea!

I was chatting offline with @meadori about how we could achieve this and here are some initial thoughts:

  1. From a Buzzer perspective and eBPF generation perspective, the ebpf program generation is already kindof self contained, we think it would make sense to export that as a go module/library that could then be imported and used into syzkaller. While we considered the option of fully integrating buzzer into Syzkaller and deprecating some other features/logic, we think there is a path moving forward where both fuzzers share a common "core" and coexist in harmony :)

  2. Thanks for your prototype! I reviewed it quickly and it overall looks good to me. I am still learning how to use the fd_arrays in eBPF but I am confident it would not pose a problem. I can't come up with anything else that would be needed as part of that interface.

  3. On the buzzer side, the eBPF generation library needs a bit of work, we would like to increase unit test coverage, refactor into a more easy to use interface, etc. I certainly can squeeze the necessary work to have a syzkaller friendly interface among that work.

  4. Regarding map types: Perhaps instead of returning the number of map fds the program uses from fd_array we could return an array of map_types? where each position in the array corresponds to a map_fd and each value represents the expected map type for that fd

@meadori is there anything else you think I am missing from here?

Thanks again for opening this and pursuing this idea!

meadori commented 1 year ago

@thatjiaozi you covered everything that we talked about. At first glance, the integration steps seem reasonable. I will look more at the specifics this week.

dvyukov commented 1 year ago

Re 1: totally fine with me.

Re 2: AFAIU when using the bpf array, the program refers to maps using index (0, 1, 2, ...), and the actual map FDs are supplied in a separate array. This allows the program to be constant regardless of actual FD values.

Re 3: Mostly up to you, but I would prefer earlier integration to parallelize the work and shake our interface details (you provide a trivial implementation early but with the final interface, and then we improve and integrate in parallel).

Re 4: Looks reasonable to me. But I don't know if different map types have significantly different interfaces or not. If they do (program validation will fail too often with wrong map type), then it makes sense.

thatjiaozi commented 1 year ago

Sounds good to me on the point on 3. Let's shake the details of the interface ASAP and then we can split the work.

I'll sync with @meadori on this and submit a PR soon.

thatjiaozi commented 1 year ago

Also I have to admint I had no idea how to use fd_array in ebpf but I just figured out, the documentation is not very helpful but this comment (https://github.com/libbpf/libbpf/blob/master/include/uapi/linux/bpf.h#L1185) gave me all the clues I needed.

After some local hacking I managed to get it to work with buzzer, I think, at the buzzer side, we should entirely ditch the hardcoded map_fd values in favor of fd_arrays as they look way more flexible.