Open dvyukov opened 1 year ago
Hey @dvyukov
Thanks for opening this, let's pursue this idea!
I was chatting offline with @meadori about how we could achieve this and here are some initial thoughts:
From a Buzzer perspective and eBPF generation perspective, the ebpf program generation is already kindof self contained, we think it would make sense to export that as a go module/library that could then be imported and used into syzkaller. While we considered the option of fully integrating buzzer into Syzkaller and deprecating some other features/logic, we think there is a path moving forward where both fuzzers share a common "core" and coexist in harmony :)
Thanks for your prototype! I reviewed it quickly and it overall looks good to me. I am still learning how to use the fd_arrays in eBPF but I am confident it would not pose a problem. I can't come up with anything else that would be needed as part of that interface.
On the buzzer side, the eBPF generation library needs a bit of work, we would like to increase unit test coverage, refactor into a more easy to use interface, etc. I certainly can squeeze the necessary work to have a syzkaller friendly interface among that work.
Regarding map types: Perhaps instead of returning the number of map fds the program uses from fd_array we could return an array of map_types? where each position in the array corresponds to a map_fd and each value represents the expected map type for that fd
@meadori is there anything else you think I am missing from here?
Thanks again for opening this and pursuing this idea!
@thatjiaozi you covered everything that we talked about. At first glance, the integration steps seem reasonable. I will look more at the specifics this week.
Re 1: totally fine with me.
Re 2: AFAIU when using the bpf array, the program refers to maps using index (0, 1, 2, ...), and the actual map FDs are supplied in a separate array. This allows the program to be constant regardless of actual FD values.
Re 3: Mostly up to you, but I would prefer earlier integration to parallelize the work and shake our interface details (you provide a trivial implementation early but with the final interface, and then we improve and integrate in parallel).
Re 4: Looks reasonable to me. But I don't know if different map types have significantly different interfaces or not. If they do (program validation will fail too often with wrong map type), then it makes sense.
Sounds good to me on the point on 3. Let's shake the details of the interface ASAP and then we can split the work.
I'll sync with @meadori on this and submit a PR soon.
Also I have to admint I had no idea how to use fd_array in ebpf but I just figured out, the documentation is not very helpful but this comment (https://github.com/libbpf/libbpf/blob/master/include/uapi/linux/bpf.h#L1185) gave me all the clues I needed.
After some local hacking I managed to get it to work with buzzer, I think, at the buzzer side, we should entirely ditch the hardcoded map_fd values in favor of fd_arrays as they look way more flexible.
syzkaller is a coverage-guided OS kernel fuzzer. It can generate BPF programs of low quality and could benefit from a high-quality BPF generator. It won't be as efficient as Buzzer in stressing BPF subsystem itself, but will uncover bugs in more complex interactions between BPF programs and other kernel subsystems.
I have a prototype for integration, which you can mostly ignore except for the proposed interface between syzkaller and Buzzer:
Buzzer and syzkaller has some model mismatch as syzkaller generates and mutates programs offline. So it's not possible to embed actual map fd's into the program (they don't exist yet, and we are not even running on the target machine). So I restricted it to use of only bpf_attr::fd_array, which syzkaller will fill with requested number of fd's later. It is possible to use fd's embed in the program, but it will require some for of a-la ELF relocations (buzzer will need to say what offsets in the program should contain fd's and which of these refer to the same/different maps). I decided to left this out for now.
Any other ideas on how to design the interface? Anything I've missed? Do we need attach type here? Or map types? Or anything to make it possible to call C functions from the BPF program?
CC @a-nogikh @tarasmadan