hathach / tinyusb

An open source cross-platform USB stack for embedded system
https://www.tinyusb.org
MIT License
5k stars 1.05k forks source link

Order of irq handling matters? #207

Closed xobs closed 4 years ago

xobs commented 4 years ago

It appears as though the order that irq events are handled matters, and this order is undocumented.

I'm porting tinyusb to a new platform, and right now it's running slowly. This architecture can handle IN and OUT endpoints at the same time, and as a result its possible to handle both inside the isr (provided it's running slowly enough).

As a result, I get odd behavior in examples such as the MSC example. The following sequence of events appears to trigger the issue:

  1. The Device sends a Status Transport response (IN). The packet transmits while the Device resumes execution.
  2. The host sends a Command Transport request (OUT).
  3. The host begins sending Data Transport requests (IN).

My ISR and processes OUT packets first, since I thought OUT and IN endpoints were separate and order doesn't matter. However, it appears to matter at least in this case.

During (2),the Device queues 512 bytes of data to be sent to the Host. This blocks, since the IN packet from (1) hasn't been finished. Then, execution continues and we process IN packets. The Device sees that the 13 byte transfer from (1) has finished, so it calls xfer_complete.

The odd behavior is that tinyusb will now queue 499 bytes to be sent, on top of the 512 bytes it's already sent.

Does the order matter? Should that be documented anywhere? I notice that many drivers handle OUT packets last, but is that merely coincidence or is it a requirement?

I will post traces and logs as soon as my home Internet connection gets fixed.

xobs commented 4 years ago

More information:

Here's a gdb trace of the last 7 queued transfer sizes and endpoints, along with the order in which xfer_complete() was called:

(gdb) p xfer_log_offset
$38 = 52 '4'
(gdb) p queue_log_offset
$39 = 8 '\b'
(gdb) p/x xfer_log[45]@7
$40 = {{
    ep_num = 0x83,
    size = 0x200
  }, {
    ep_num = 0x83,
    size = 0xd
  }, {
    ep_num = 0x3,
    size = 0x1f
  }, {
    ep_num = 0x83,
    size = 0x200
  }, {
    ep_num = 0x3,
    size = 0x1f
  }, {
    ep_num = 0x83,
    size = 0xd
  }, {
    ep_num = 0x83,
    size = 0x200
  }}
(gdb) p/x queue_log[0]@7
$41 = {{
    ep_num = 0x83,
    size = 0x200
  }, {
    ep_num = 0x83,
    size = 0xd
  }, {
    ep_num = 0x3,
    size = 0x1f
  }, {
    ep_num = 0x83,
    size = 0x200
  }, {
    ep_num = 0x83,
    size = 0xd
  }, {
    ep_num = 0x3,
    size = 0x1f
  }, {
    ep_num = 0x83,
    size = 0x200
  }}
(gdb)
xobs commented 4 years ago

Additionally, here is a trace:

image

The last thing tinyusb tried to do was transfer 499 bytes.

hathach commented 4 years ago

The order of dcd_xfer_complete is always important and it should respect the order of actual packet transferred in the bus. Host won't send the next scsi command (2) before it receive the status of current command (1). Why would your port call the dcd_xfer_complete for the (2) before (1) while the actual event on bus is (1) then (2).

xobs commented 4 years ago

The IRQ is shared by both IN and OUT transfers, so in the ISR it first polls https://rm.fomu.im/usb.html#usb-out-ev-pending to see if there's an OUT event pending, and then it polls https://rm.fomu.im/usb.html#usb-in-ev-pending to see if there's an IN event. In this case, the OUT event is short enough that both bits are set when the ISR runs, so it doesn't know which came first.

Perhaps the correct solution is to add a bit to the hardware to determine which packet came first -- IN or OUT -- and process that one first.

pigrew commented 4 years ago

This sounds like how STM32 FSDEV operates. In it, all of the EP have separate single-packet buffers. There is no way to know which packet was transmitted first across endpoints, through there is some extra logic for EP0 relating to setup packets. There is no way to know if a IN or OUT packet came first. The class driver must be written to know in what order to read/write the endpoints. I've noticed that the USBTMC class seems to issue BULK IN requests before the BULK OUT transfer requesting the data. (the BULK IN is NAKed until the BULK OUT is processed).

In the interrupt, one should process SETUP/EP0 first (since it may ask for an EP to be halted), and then move onto the various other EP.

(I'm not familiar with MSC, so I'm not sure what the issue here is, or if comments are helpful)

hathach commented 4 years ago

The IRQ is shared by both IN and OUT transfers, so in the ISR it first polls https://rm.fomu.im/usb.html#usb-out-ev-pending to see if there's an OUT event pending, and then it polls https://rm.fomu.im/usb.html#usb-in-ev-pending to see if there's an IN event. In this case, the OUT event is short enough that both bits are set when the ISR runs, so it doesn't know which came first.

Perhaps the correct solution is to add a bit to the hardware to determine which packet came first -- IN or OUT -- and process that one first.

Indeed, it could be an issue if ISR isn't running fast enough. This is particular with MSC class driver, at the end of data phase, it will dcd_xfer an status phase (13 bytes), then dcd_xfer the next command phase right away. https://github.com/hathach/tinyusb/blob/master/src/class/msc/msc_device.c#L554 https://github.com/hathach/tinyusb/blob/master/src/class/msc/msc_device.c#L571

Yeah, IRQ may not run immediately depending on the mcu and rtos, maybe something is running in critical context. So this is a valid scenario, I will make a change to msc driver on separate for you to test with.

@pigrew yeah control normally has it own way to handle setup and/or status. This happens since IN + OUT combine to have an sequence of SCSI command. Most of the time the IN & OUT are relatively Independence to each other. This only happens since msc class driver queue out transfer right after an IN without waiting for its completion. Your involvement is always helpful :)

xobs commented 4 years ago

Thanks for the feedback! I added a register to indicate which type of packet should be handled next, and that seems to solve the problem with the current setup: https://rm.fomu.im/usb.html#usb-next-ev

hathach commented 4 years ago

@xobs wow, you are designing an FGPA hardware, I was thinking you are porting the stack to an mcu. Awesome :smile:

xobs commented 4 years ago

Turns out tinyusb is a great way to find all sorts of implementation bugs. But it's always a question of where the bug lies -- is it in the FPGA hardware, the DCD driver, or in tinyusb? This is the first issue I've found in tinyusb so far, and it actually ended up being a hardware issue!

This is also a riscv port, which turned out to not be the difficult part at all! https://github.com/xobs/tinyusb/tree/valentyusb-eptri

hathach commented 4 years ago

@xobs this is not really entirely hardware issue, tinyusb shouldn't rely on ISR running fast enough to work. Like @pigrew mentioned, most of the DCD only has register indicating which endpoint is complete, and not their order of completion. If the ISR function latency is long enough ( 1ms for Fullspeed, 125us for Highspeed), there is no way to tell the order. TinyUSB aims to support all ports, so It shouldn't rely much on hardware :).

This is also a riscv port, which turned out to not be the difficult part at all! https://github.com/xobs/tinyusb/tree/valentyusb-eptri

This is great to hear, which risv and/or fpga board you are running on, I may try to find one to have some tests as well on these platforms.

xobs commented 4 years ago

This is running on a Vexriscv CPU in an ICE40UP5k inside a Fomu. I'm using Fomu EVT so I can attach a debugger to the softcore to debug the CPU -- normally we run the debugger over USB, but that doesn't work so well when the USB core isn't stable!

hathach commented 4 years ago

This is running on a Vexriscv CPU in an ICE40UP5k inside a Fomu. I'm using Fomu EVT so I can attach a debugger to the softcore to debug the CPU -- normally we run the debugger over USB, but that doesn't work so well when the USB core isn't stable!

sound like It needs a bit of learning curve to work on these platforms :D. Thanks for using tinyUSB, I am glad to hear you didn't find that many bugs from the project lol :laughing:

hathach commented 4 years ago

@xobs Let's give this a try to see if this patch #208 fixes your issue

xobs commented 4 years ago

With the addition of https://github.com/im-tomu/valentyusb/commit/2f240d9c8a5c9aa0adfe96799b7df47942f2c5c0 (and https://github.com/im-tomu/valentyusb/commit/d615069b45535b10464b7ade725de1aa9ab25e11 after doing some more thinking), I'm able to use the msc example without issue without #208, even if I'm running unoptimized debug code from extremely slow flash.

However, #208 seems like a good idea, especially the unit tests, so I think you should merge it anyway.

hathach commented 4 years ago

@xobs thanks for your feedback. The unit test take more time to write than the actual code. The code changes is easy enough, but I take the chance to write unit test since those are nice to have. Plan to write more unit tests in the future when having time.