google / syzkaller

syzkaller is an unsupervised coverage-guided kernel fuzzer
Apache License 2.0
5.4k stars 1.23k forks source link

syzlang: support referencing sibling non-nested structures #1884

Open rodionov opened 4 years ago

rodionov commented 4 years ago

Hello,

Current implementation of syzlang doesn't allow referencing a sibling non-nested structures used in syscall definitions. For example, in two syscall definitions below -- ioctl$usb_gadget_accessory_register_hid and ioctl$usb_gadget_accessory_hid_report_desc -- it would be desirable to define structure field control_request_register_hid.report_desc_len as len[control_request_hid_report, int16] (so that the first syscall ioctl$usb_gadget_accessory_register_hid reports length of the HID report descriptor which is sent in the second syscall ioctl$usb_gadget_accessory_hid_report_desc).

ioctl$usb_gadget_accessory_register_hid(fd acc_ep, cmd const[USBDEVFS_CONTROL], arg ptr[in, control_request_register_hid])
control_request_register_hid {
    request_type    const[0x40, int8]
    request     const[0x36, int8]
    hid_id      flags[hiddev_id, int16]
        # setting report_desc_len field to len[control_request_report]
        # causes a syzlang syntax error
    report_desc_len int16 
    length      const[0x0, int16]
    data        const[0x0, intptr]
}

ioctl$usb_gadget_accessory_hid_report_desc(fd acc_ep, cmd const[USBDEVFS_CONTROL], arg ptr[in, control_request_hid_report_desc])
control_request_hid_report {
    data    array[int8]
}
control_request_hid_report_desc {
    request_type    const[0x40, int8]
    request     const[0x38, int8]
    hid_id      flags[hiddev_id, int16]
    index       const[0x0, int16]
    length      len[data, int16]
    data        ptr[in, control_request_hid_report]
}

Thank you,

dvyukov commented 4 years ago

A program may contain any number of ioctl$usb_gadget_accessory_register_hid and any number of ioctl$usb_gadget_accessory_hid_report_desc syscalls, including 0. How should it behave if the program contains no ioctl$usb_gadget_accessory_hid_report_desc syscalls? Or 2 ioctl$usb_gadget_accessory_hid_report_desc syscalls? Or 2 usb_gadget_accessory_register_hid and 1 usb_gadget_accessory_hid_report_desc? Should it depend on the order the syscalls?

It's also unclear how kernel handles this b/c it has the same problem -- the second syscall may not follow, or 3 of them happen at the same time. Does it really accept data length in one syscalls and the buffer itself in another? @xairy do you know anything about this interface?

rodionov commented 4 years ago

Syscall ioctl$usb_gadget_accessory_hid_report_desc should be executed after a matchingioctl$usb_gadget_accessory_register_hid. The calls are matching if fields control_request_hid_report_desc.length and control_request_hid_report_desc.hid_idare exactly the same as fields control_request_register_hid.report_desc_len and control_request_register_hid.hid_id correspondingly. If this condition doesn't hold or the syscalls are executed in the reverse order the interesting kernel code won't be executed.

Thus, if we have 0 ioctl$usb_gadget_accessory_hid_report_desc calls then the field control_request_register_hid.report_desc_len can be any value.

There could be an arbitrary number of ioctl$usb_gadget_accessory_register_hid calls with arbitrary values of hid_id and report_desc_len. However, to get coverage of the target kernel code there should be at least one matching ioctl$usb_gadget_accessory_hid_report_desc call in the right order.

Here is what happens internally, upon the first syscall (hid device registration) the kernel allocates a memory buffer to hold the report descriptor for a give hid_id. Then, at the send syscall (sending hid report descriptor) the buffer is filled with the actual report descriptor. If the second syscall happens before executing the first one, the kernel does nothing, as it cannot find the allocated buffer for this hid_id.

dvyukov commented 4 years ago

This may be quite hard to implement and it may increase overall systematic implementation complexity. Since it's the only use case for this so far, I would suggest to find some creative workaround. E.g. maybe using flags type with some limited set of values in both syscalls will make matching sizes probable enough.

xairy commented 4 years ago

There's a similar (same?) use case for this in vusb.txt also when dealing with HID descriptors (link).

dvyukov commented 4 years ago

Will using flags for the len help there?

If bcdHID int16 should match across 2 calls, it can make sense to restrict the range of values to few.

xairy commented 4 years ago

It's not the bcdHID field that should match, it's the length of the HID report descriptor (usb_hid_class_descriptor_report->wDescriptorLength should match the length of hid_descriptor_report in case of vusb.txt). We could use a few fixed lengths for report descriptors and strip/extend hid_descriptor_report to match those in C code, but it's not an ideal solution.

I think this case can actually be generalized and be implemented as a part of the resource infrastructure. For that we'll need:

  1. Allow resources of arbitrary types (structures, etc.).
  2. Introduce another type of resources that are randomly generated based on the descriptions (not static, nor returned by any syscalls).

In this case we can then declare hid_descriptor_report as a resource, which syscalls can refer to.

dvyukov commented 4 years ago

It's not the bcdHID field that should match, it's the length of the HID report descriptor

https://github.com/google/syzkaller/issues/1884#issuecomment-652039270 says that both length and hid_id need to match. What is hid_id in our descriptions? Does it need to match as well?

dvyukov commented 4 years ago

and strip/extend hid_descriptor_report to match those in C code

Potentially we can avoid this if we make it of some maximum size. At least in @rodionov descriptions it is indirect via a pointer and kernel does not really know how much memory we allocated there (except for the length that we pass separately).

xairy commented 4 years ago

#1884 (comment) says that both length and hid_id need to match. What is hid_id in our descriptions? Does it need to match as well?

For vusb.txt: there are no HID IDs. For usb_gadget_accessory: (assuming that we implement the scheme described above) I guess we could make hid_id another "random" resource along with hid_descriptor_report.