zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.03k stars 6.17k forks source link

Use after free in stm32 usb_dc_stm32_isr() leads to data corruption #61977

Open arbrauns opened 10 months ago

arbrauns commented 10 months ago

Describe the bug Under enough USB traffic load, the implementation of usb_dc_ep_write() in the stm32 USB driver can cause data corruption. This is because the data pointer it accepts is stored away until a TX FIFO empty interrupt occurs, yet usb_dc_ep_write() returns immediately. If the pointer passed in to usb_write() is stack-allocated, it will point to gargage by the time the FIFO empty interrupt occurs.

The exact sequence of events is as follows:

  1. usb_dc_ep_write() is called (e.g. by mass storage class) with a stack data pointer 0x200561ac

    (gdb) bt
    #0  usb_dc_ep_write (ep=ep@entry=129 '\201', data=data@entry=0x200561ac <z_interrupt_stacks+1836> "", data_len=data_len@entry=8, ret_bytes=ret_bytes@entry=0x0) at $PRJDIR/zephyr/drivers/usb/device/usb_dc_stm32.c:783
    #1  0x0801092c in usb_write (ep=<optimized out>, data=data@entry=0x200561ac <z_interrupt_stacks+1836> "", data_len=data_len@entry=8, bytes_ret=bytes_ret@entry=0x0) at $PRJDIR/zephyr/subsys/usb/device/usb_device.c:1392
    #2  0x080122fe in write (buf=buf@entry=0x200561ac <z_interrupt_stacks+1836> "", size=size@entry=8) at $PRJDIR/zephyr/subsys/usb/device/class/msc.c:262
    #3  0x08012510 in readCapacity () at $PRJDIR/zephyr/subsys/usb/device/class/msc.c:391
    #4  0x08013270 in CBWDecode (buf=buf@entry=0x20056234 <z_interrupt_stacks+1972> "", size=<optimized out>) at $PRJDIR/zephyr/subsys/usb/device/class/msc.c:555
    #5  0x08013e76 in mass_storage_bulk_out (ep=<optimized out>, ep_status=<optimized out>) at $PRJDIR/zephyr/subsys/usb/device/class/msc.c:719
    #6  0x0806549a in HAL_PCD_DataOutStageCallback (hpcd=<optimized out>, epnum=<optimized out>) at $PRJDIR/zephyr/drivers/usb/device/usb_dc_stm32.c:1096
    #7  0x080967f6 in PCD_EP_OutXfrComplete_int (hpcd=hpcd@entry=0x20049160 <usb_dc_stm32_state>, epnum=epnum@entry=1) at $PRJDIR/modules/hal/stm32/stm32cube/stm32f7xx/drivers/src/stm32f7xx_hal_pcd.c:2217
    #8  0x08097036 in HAL_PCD_IRQHandler (hpcd=hpcd@entry=0x20049160 <usb_dc_stm32_state>) at $PRJDIR/modules/hal/stm32/stm32cube/stm32f7xx/drivers/src/stm32f7xx_hal_pcd.c:1039
    #9  0x08064028 in usb_dc_stm32_isr (arg=<optimized out>) at $PRJDIR/zephyr/drivers/usb/device/usb_dc_stm32.c:197
    #10 0x08020cfe in _isr_wrapper () at $PRJDIR/zephyr/arch/arm/core/aarch32/isr_wrapper.S:259
    #11 <signal handler called>
    #12 arch_cpu_idle () at $PRJDIR/zephyr/arch/arm/core/aarch32/cpu_idle.S:127
    #13 0x080a3d22 in k_cpu_idle () at $PRJDIR/zephyr/include/zephyr/kernel.h:5749
    #14 idle (unused1=<optimized out>, unused2=<optimized out>, unused3=<optimized out>) at $PRJDIR/zephyr/kernel/idle.c:88
    #15 0x08009b02 in z_thread_entry (entry=0x80a3cf5 <idle>, p1=0x2002d1e0 <_kernel>, p2=0x0, p3=0x0) at $PRJDIR/zephyr/lib/os/thread_entry.c:36
    #16 0xaaaaaaaa in ?? ()
    
    (gdb) x/8xb 0x200561ac
    0x200561ac <z_interrupt_stacks+1836>:    0x00    0x00    0x00    0xbf    0x00    0x00    0x02    0x00
  2. usb_dc_ep_write() returns successfully without actually having written anything, the data pointer is merely stored in the HAL struct:

    (gdb) fin
    Run till exit from #0  usb_dc_ep_write (ep=ep@entry=129 '\201', data=data@entry=0x200561ac <z_interrupt_stacks+1836> "", data_len=data_len@entry=8, ret_bytes=ret_bytes@entry=0x0) at $PRJDIR/zephyr/drivers/usb/device/usb_dc_stm32.c:783
    0x0801092c in usb_write (ep=<optimized out>, data=data@entry=0x200561ac <z_interrupt_stacks+1836> "", data_len=data_len@entry=8, bytes_ret=bytes_ret@entry=0x0) at $PRJDIR/zephyr/subsys/usb/device/usb_device.c:1392
    1392         ret = usb_dc_ep_write(ep, data, data_len, bytes_ret);
    Value returned is $12 = 0
    
    (gdb) p usb_dc_stm32_state.pcd.IN_ep[1].xfer_buff
    $13 = (uint8_t *) 0x200561ac <z_interrupt_stacks+1836> ""
  3. Data on stack is overwritten (by some arbitrary thread, this is just for illustration)

    (gdb) watch *0x200561ac
    Hardware watchpoint 10: *0x200561ac
    (gdb) c
    Continuing.
    
    Thread 29 hit Hardware watchpoint 10: *0x200561ac
    
    Old value = -1090519040
    New value = 134627885
    0x08096b6c in HAL_PCD_EP_Receive (hpcd=hpcd@entry=0x20049160 <usb_dc_stm32_state>, ep_addr=ep_addr@entry=1 '\001', pBuf=pBuf@entry=0x200498bc <usb_dc_stm32_state+1884> "USBC\003", len=64) at $PRJDIR/modules/hal/stm32/stm32cube/stm32f7xx/drivers/src/stm32f7xx_hal_pcd.c:1759
    1759 {
  4. TX FIFO Empty interrupt occurs, reads corrupted data and sends it to host

    (gdb) b USB_WritePacket
    Breakpoint 11 at 0x8099be8: file $PRJDIR/modules/hal/stm32/stm32cube/stm32f7xx/drivers/src/stm32f7xx_ll_usb.c, line 1100.
    (gdb) c
    Continuing.
    
    Thread 29 hit Breakpoint 11, USB_WritePacket (USBx=USBx@entry=0x50000000, src=0x200561ac <z_interrupt_stacks+1836> "-B\006\b", ch_ep_num=ch_ep_num@entry=1 '\001', len=len@entry=8, dma=0 '\000') at $PRJDIR/modules/hal/stm32/stm32cube/stm32f7xx/drivers/src/stm32f7xx_ll_usb.c:1100
    1100 {
    
    (gdb) bt
    #0  USB_WritePacket (USBx=USBx@entry=0x50000000, src=0x200561ac <z_interrupt_stacks+1836> "-B\006\b", ch_ep_num=ch_ep_num@entry=1 '\001', len=len@entry=8, dma=0 '\000') at $PRJDIR/modules/hal/stm32/stm32cube/stm32f7xx/drivers/src/stm32f7xx_ll_usb.c:1100
    #1  0x08096474 in PCD_WriteEmptyTxFifo (hpcd=hpcd@entry=0x20049160 <usb_dc_stm32_state>, epnum=epnum@entry=1) at $PRJDIR/modules/hal/stm32/stm32cube/stm32f7xx/drivers/src/stm32f7xx_hal_pcd.c:2097
    #2  0x0809720c in HAL_PCD_IRQHandler (hpcd=hpcd@entry=0x20049160 <usb_dc_stm32_state>) at $PRJDIR/modules/hal/stm32/stm32cube/stm32f7xx/drivers/src/stm32f7xx_hal_pcd.c:1166
    #3  0x08064028 in usb_dc_stm32_isr (arg=<optimized out>) at $PRJDIR/zephyr/drivers/usb/device/usb_dc_stm32.c:197
    #4  0x08020cfe in _isr_wrapper () at $PRJDIR/zephyr/arch/arm/core/aarch32/isr_wrapper.S:259
    #5  <signal handler called>
    #6  arch_cpu_idle () at $PRJDIR/zephyr/arch/arm/core/aarch32/cpu_idle.S:127
    #7  0x080a3d22 in k_cpu_idle () at $PRJDIR/zephyr/include/zephyr/kernel.h:5749
    #8  idle (unused1=<optimized out>, unused2=<optimized out>, unused3=<optimized out>) at $PRJDIR/zephyr/kernel/idle.c:88
    #9  0x08009b02 in z_thread_entry (entry=0x80a3cf5 <idle>, p1=0x2002d1e0 <_kernel>, p2=0x0, p3=0x0) at $PRJDIR/zephyr/lib/os/thread_entry.c:36
    #10 0xaaaaaaaa in ?? ()
    
    (gdb) x/8xb src
    0x200561ac <z_interrupt_stacks+1836>:    0x2d    0x42    0x06    0x08    0x00    0x00    0x02    0x00

This makes it impossible for me to use mass storage on STM32 USB_OTG_FS, since the SCSI packets are consistently corrupted.

To Reproduce The subsys/usb/mass sample might work, but since this is related to USB host timing, I wouldn't bet on reproduction being easily possible. I think the cause of the issue should be clear enough, I can of course test proposed fixes on my setup.

Expected behavior USB data is not corrupted.

Impact USB mass storage does not work at all, other USB protocols might have undetected integrity problems.

Logs and console output This Wireshark capture shows the corrupted data in packet 83 (should be 000000bf 00000200, is 2d420608 00000200, mirroring debugger output above): zephyr_usb_corruption.pcapng.gz

Environment (please complete the following information):

carlescufi commented 10 months ago

@arbrauns I know this might not be what you need, but there is an experimental USB next stack that has much better support for USB Mass Storage. It also supports STM32 already: https://github.com/zephyrproject-rtos/zephyr/blob/main/drivers/usb/udc/udc_stm32.c

CC @jfischer-no @tmon-nordic

arbrauns commented 10 months ago

Ah, maybe I'll try my hand at backporting that to 3.4.0, but that sounds like quite a bit of work.

carlescufi commented 10 months ago

Ah, maybe I'll try my hand at backporting that to 3.4.0, but that sounds like quite a bit of work.

The -next stack was already present in 3.4.0, it's just the STM32 driver you'd need to backport.

arbrauns commented 10 months ago

It's also still experimental, I'd like to avoid that in production applications.

github-actions[bot] commented 7 months ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.

erwango commented 5 months ago

@arbrauns Sorry to coming back this late on this but is the problem still valid ? If yes: What is the target you're using ? Any specific configuration used ? Any information that could help us to analyse would be helpfull.

erwango commented 5 months ago

Unfortunately, we're not able to reproduce the issue. A dedicated application to ease reproduction would be required if problem is still valid

arbrauns commented 5 months ago

I can still reproduce this on 3.4.0, I don't have the time to create a standalone reproducer at the moment. Is the description in the OP not enough to understand the underlying problem, and to figure out if it can still occur? Specifically, that usb_dc_ep_write() passes its data argument to HAL_PCD_EP_Transmit(), and then returns successfully while the buffer is still in the process of being sent?

arbrauns commented 5 months ago

It's a race condition largely influenced by the USB host the device is plugged into, so I'm not confident it's easily reproducible at all.

arbrauns commented 4 months ago

I can still reproduce this on 3.6.0/latest main (ad58b39f411d75c263aa213f69d19e0deac463cd), although not with a standalone example due to interrupt timing. The crux of the issue continues to be visible in this stack trace:

(gdb) bt
#0  usb_dc_ep_write (ep=ep@entry=129 '\201', data=data@entry=0x2001a87c <z_interrupt_stacks+1916> "", data_len=data_len@entry=8, ret_bytes=ret_bytes@entry=0x0) at ZEPHYR_ROOT/zephyr/drivers/usb/device/usb_dc_stm32.c:817
#1  0x08008c50 in usb_write (ep=<optimized out>, data=0x2001a87c <z_interrupt_stacks+1916> "", data_len=data_len@entry=8, bytes_ret=bytes_ret@entry=0x0) at ZEPHYR_ROOT/zephyr/subsys/usb/device/usb_device.c:1392
#2  0x080027e2 in write (buf=<optimized out>, size=<optimized out>) at ZEPHYR_ROOT/zephyr/subsys/usb/device/class/msc.c:321
#3  0x08002ac6 in modeSense6 () at ZEPHYR_ROOT/zephyr/subsys/usb/device/class/msc.c:411
#4  CBWDecode (size=<optimized out>, buf=0x2001a890 <z_interrupt_stacks+1936> "USBC\003") at ZEPHYR_ROOT/zephyr/subsys/usb/device/class/msc.c:602
#5  mass_storage_bulk_out (ep=<optimized out>, ep_status=<optimized out>) at ZEPHYR_ROOT/zephyr/subsys/usb/device/class/msc.c:778
#6  0x080052b8 in PCD_EP_OutXfrComplete_int (epnum=1, hpcd=0x20000ed0 <usb_dc_stm32_state>) at ZEPHYR_ROOT/modules/hal/stm32/stm32cube/stm32f2xx/drivers/src/stm32f2xx_hal_pcd.c:2087
#7  HAL_PCD_IRQHandler (hpcd=0x20000ed0 <usb_dc_stm32_state>) at ZEPHYR_ROOT/modules/hal/stm32/stm32cube/stm32f2xx/drivers/src/stm32f2xx_hal_pcd.c:960
#8  0x080037f2 in _isr_wrapper () at ZEPHYR_ROOT/zephyr/arch/arm/core/cortex_m/isr_wrapper.S:128

usb_dc_ep_write() is called in an interrupt context, with a data pointer that points to the interrupt stack. Instead of copying that data somewhere safe, or blocking until it's been sent, it simply stores the data pointer (to later be read out by another interrupt) and returns, eventually ending the ISR. If any interrupt occurs between the ISR calling usb_dc_ep_write() and the ISR reading the stored pointer, and fills up the interrupt stack sufficiently, the data is corrupted before it can be sent out.

github-actions[bot] commented 1 month ago

This issue has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this issue will automatically be closed in 14 days. Note, that you can always re-open a closed issue at any time.