zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.01k stars 6.16k forks source link

USB core has a data race during write on many platforms #71299

Open khoek opened 2 months ago

khoek commented 2 months ago

The function hid_int_ep_write() has a data race where the TX data may change underneath the USB driver. For example, hid_int_ep_write() calls usb_write() directly passing the user's data buffer. Many USB drivers do not copy the buffer before using it e.g. nrfx and stm32, while e.g. rpi_pico, native_posix, and kinetis do. This means that if the caller modifies the passed data buffer before the TX is complete the host may observe the new contents.

The problem was identified on the nrfx platform (a Kinesis Adv360 keyboard running ZMK), and was observed to very reliably cause keys to be dropped. On wireshark two TX messages were recorded with each keypress (corresponding to key press and key release), but both messages contained the same contents (no keys pressed). Only a key press-release combination generated by a macro-like mode of ZMK is fast enough to trigger the bug. My proposed fix to ZMK, the PR https://github.com/zmkfirmware/zmk/pull/2257, simply copies the data into a temporary buffer before the call and immediately fixed the problem as tested on real hardware.

I am confident this is bug in Zephyr because all but one of the samples which call hid_int_ep_write(), namely usb/hid-mouse, usb/hid-cdc, and sensor/fxos8700-hid all do not protect the data buffer after calling hid_int_ep_write(), e.g. by modifying it immediately after the call, or by leaving it on the stack and returning allowing for corruption. (Note that the semaphore in sensor/fxos8700-hid is not protecting the data buffer.) The only sample which does not have this issue (usb/hid) is because it is too simple and uses a constant buffer.

In order to fix the problem either the HID function hid_int_ep_write() or the USB core function usb_write() (or the underlying driver usb_write() immediately calls) need to copy the data buffer. I am happy to implement a fix in any case but seek confirmation of what the intended API is and thus which approach to take.

Thank you very much in advance for your help.

github-actions[bot] commented 2 months ago

Hi @khoek! We appreciate you submitting your first issue for our open-source project. 🌟

Even though I'm a bot, I can assure you that the whole community is genuinely grateful for your time and effort. 🤖💙

khoek commented 2 months ago

Actually, while stm32 doesn't copy the buffer, it does wait until the transmission has been received by the host before returning. Other platforms I am checking appear to copy the buffer. I suppose this is just a bug in nrfx.

jfischer-no commented 2 months ago

I am confident this is bug in Zephyr because all but one of the samples which call hid_int_ep_write(), namely usb/hid-mouse, usb/hid-cdc, and sensor/fxos8700-hid all do not protect the data buffer after calling hid_int_ep_write(), e.g. by modifying it immediately after the call, or by leaving it on the stack and returning allowing for corruption. (Note that the semaphore in sensor/fxos8700-hid is not protecting the data buffer.) The only sample which does not have this issue (usb/hid) is because it is too simple and uses a constant buffer.

Yes, all HID samples are sloppy with report buffers. At least hid and hid-mouse should be fixed. We should consider removing hid-cdc and fxos8700-hid samples as they do not provide much benefit. fxos8700 is actually a generic sample and can be reworked into a generic input driver.

73098 removes usb/hid-cdc

73210 removes fxos8700-hid