hathach / tinyusb

An open source cross-platform USB stack for embedded system
https://www.tinyusb.org
MIT License
5.02k stars 1.05k forks source link

Double Buffering #643

Closed MimitechIndustries closed 3 years ago

MimitechIndustries commented 3 years ago

I started to integrate/use tinyusb into my STM32 project but noticed that there was no double buffering support. This might not be a deal breaker, but common sense would say that performance would be impacted significantly due to lack of double buffering support and my requirement is to be able to saturate most of the available bandwidth of the bus while keeping microcontroller utilization very low.

For now, I am going to switch to this stack https://github.com/xtoolbox/TeenyUSB which I've used in the past for some basic keyboard/mouse host controller stuff and supports double buffering; however, when I get the chance I will benchmark both tinyusb and TeenyUSB together to evaluate if the double buffering that TeenyUSB offers provides any actual tangible benefit. I will post findings in this issue when I have them.

hathach commented 3 years ago

Please add specific family mcu to the issue title stm32 f4, h7 etc since stm32 has 2 usb controllers.

MimitechIndustries commented 3 years ago

Hmm, I was under the impression from looking at the source code that tinyusb stack in general isn't structured in a way to facilitate double buffering (except on one mcu family that has it built into hardware or something like that, I don't remember).

I guess my question is -- do you mean you want to add f4 or h7, etc. to the title so that double buffering support is worked on for one specific mcu? I was thinking the feature request would be to bring double buffering support as a whole to tinyusb to any/all mcus that support it.

cr1901 commented 3 years ago

I was under the impression from looking at the source code that tinyusb stack in general isn't structured in a way to facilitate double buffering

Same. I was under the impression that tinyusb single buffers by design to keep the packet-handling, well, tiny :).

When I wrote the MSP430 backend, I explicitly used single buffering even tho the hardware supports double. I've not run into any problems in practice, but me eventually doing a benchmark would still prob be nice.

hathach commented 3 years ago

I guess my question is -- do you mean you want to add f4 or h7, etc. to the title so that double buffering support is worked on for one specific mcu? I was thinking the feature request would be to bring double buffering support as a whole to tinyusb to any/all mcus that support it.

ah, I see, I thought you want to implement the hardware double buffering for some of the STM32 MCU, look like you want to have USBD-level double buffering. So far there isn't a need to do so, and we don't really need it in system-wide manner. Only a few bandwidth hungry class require that such as MSC, vendor and/or video. Sure when needed, we could implement it.

MimitechIndustries commented 3 years ago

OK sounds good 👍

Also, one of the other things I noticed is that for full speed bulk, it sets the max packet size to 64 and seems to mandate the use of that value; however, in my case I send 44 byte packets since one frame of data for me is 44 bytes. When I send 44 bytes on a 64 MPS endpoint, the host USB on the other end doesn't ask for more data in a timely manner because it says "oh look you didn't even use your max allotted packet size so I guess I can stop bothering you for a bit" (I have looked at this on my USB packet analyzer). This causes throughput to slow down very badly. If I set the endpoint's MPS to 44, my throughout gets much higher since the host immediately asks for the next bulk packet since it thinks there's more data since I'm hitting the max packet size. Without double buffering, this next packet isn't necessarily there fast enough and it goes into a NAK loop for 3-4 iterations while the next packet's data is copied into the USB packet data memory segment. I could re-package my data to fit the 64-byte MPS but it would be a lot of additional, unnecessary data copying/shuffling in memory and add latency to the data being output. Right now I have a simple ring buffer of 44 byte bulk packets that I just pump as fast as I can. I wanted to get a bit of feedback on this -- should tinyusb support less-than-max MPS bulk endpoints? I personally think this makes sense since I have seen good success with this technique on all USB stacks I have interacted with. Let me know if opening a feature request for this makes sense.

kkitayam commented 3 years ago

dcd_edpt_xfer() supports a transaction from 0 up to 65535. If you want to transfer 44 bytes at one transaction, you can specify 44 in dcd_edpt_xfer(). If you want to change the max packet size, you can change it with USB descriptor epsize.

I think that packet-level double buffering is the job of DCD, not USBD. The KL25Z implementation supports packet-level double buffering for a transaction larger than MPS (=64) bytes. Unfortunately, I don't have USB packet analyzer, so I haven't checked if NAK is decreased, but I think it is.

If we need transaction-level double buffering, I think it's better that the USBD class driver or application supports it.

hathach commented 3 years ago

The specs only allow Bulk for FS to be 8, 16, 32 and 64. Unfortunately 44 is out of supported values. Look like I need to make an PR to relax the constraint for bulk.

image

hathach commented 3 years ago

@kkitayam thanks, I feel the same way as well, since most recent MCU all support DMA, the double buffering is mainly meant for raw bandwidth usage of the hardware. On software side, FIFO (byte copy or DMA) can be added according to the need of the class driver e.g cdc use normal fifo, audio will make use of DMA FIFO. Though, I am still open to discussion and all PRs.

MimitechIndustries commented 3 years ago

Interesting, so 44 is technically not a supported value by the official USB spec -- glad Windows happily allows it though :) -- I'll have to test my application on other OSes more thoroughly since I intend on continuing to use 44 for my application. I think it's sort of a silly constraint especially when you consider how the USB protocol works (don't call us, we'll call you).

hathach commented 3 years ago

@MimitechIndustries it is up to you to decide if it is worth the compatibility break should future OS enforce such as check. I have corrected the bulk size check for FS endpoint but also leave it open for you by not enforcing the 8,16,32,64 check since most people will go to 64 anyway. https://github.com/hathach/tinyusb/pull/647 . Though it may be forced in the future.

Should you have any further information regarding the bulk ep size please open an discussion or issue, we should keep this issue focused on "double buffering", thanks.

MimitechIndustries commented 3 years ago

Sounds good, thank you very much.

todbot commented 3 years ago

Windows is usually very strict about USB compliance (compared to Linux and MacOS). If you want to see if your out-of-spec device works, I'd recommend trying it there.

PanRe commented 3 years ago

As for the STM32F4 which has a hardware USB FIFO, there is not double buffering possible. Once the hardware FIFO was loaded and scheduled for transmit or receive, hardware buffer is locked and you can not load any more into it. IMHO double buffering is only necessary if you need linear buffers e.g. for signal processing etc. In any other case i would suggest a ring buffer like the tu_fifo, this buffer is also capable of being loaded/read from by DMA. Some functions supporting this are not merged but are on their way into master.

hathach commented 3 years ago

after revising the code, I think software double buffering is implemented in the form of fifo. Hardware double buffering on dcd is currently not implemented, since it often needs to use 2 endpoint e.g 1 IN + 1 OUT for an double buffer OUT. currently stack does not support dcd specific option. I will close this issue for now, feel free to open new issue when there is new information.