hathach / tinyusb

An open source cross-platform USB stack for embedded system
https://www.tinyusb.org
MIT License
4.98k stars 1.05k forks source link

Better dcd synopsys fifo allocation for ISO #540

Closed hathach closed 5 months ago

hathach commented 3 years ago

Is your feature request related to a problem? Please describe.

current synopsys EP FIFO allocation scheme does not support high ISO packet size e.g 192 in #538 . It assumes all FS endpoints packet only has up to 64 bytes. However FS ISO can have up to 1023 bytes.
https://github.com/hathach/tinyusb/blob/master/src/portable/st/synopsys/dcd_synopsys.c#L216

We need a better scheme not only resolve this FIFO OUT, but also open/close/re-open ISO endpoints with different packet size e.g 100, 200, 300

For a quick walkaround, we can increase the FIFO OUT to let's say 256 when AUDIO class is enabled e.g

// largest_epsize is the largest of all ep packet size used in the configuration descriptors
static inline uint16_t allocate_grxfsiz(uint16_t largest_epsize)
{
  return 15 + 2 x (largest_epsize/4) + 2*EP_MAX;
}

  //   Therefore GRXFSIZ = 13 + 1 + 1 + 2 x (Largest-EPsize/4) + 2 x EPOUTnum
  //   - FullSpeed (64 Bytes ): GRXFSIZ = 15 + 2 x  16 + 2 x EP_MAX = 47  + 2 x EP_MAX
  //   - Highspeed (512 bytes): GRXFSIZ = 15 + 2 x 128 + 2 x EP_MAX = 271 + 2 x EP_MAX
  //
  //   NOTE: Largest-EPsize & EPOUTnum is actual used endpoints in configuration. Since DCD has no knowledge
  //   of the overall picture yet. We will use the worst scenario: largest possible + EP_MAX
  //
  //   FIXME: for Isochronous, largest EP size can be 1023/1024 for FS/HS respectively. In addition if multiple ISO
  //   are enabled at least "2 x (Largest-EPsize/4) + 1" are recommended.  Maybe provide a macro for application to
  //   overwrite this.

#if TUD_OPT_HIGH_SPEED
  _allocated_fifo_words = allocate_grxfsiz(512);
#else
#if CFG_TUD_AUDIO
  // FIXME walkround to increase OUT FIFO limit to able to support up to ISO packet size of 256 
  _allocated_fifo_words =  allocate_grxfsiz(256);
#else
 _allocated_fifo_words =  allocate_grxfsiz(64);
#endif 
#endif

Originally posted by @hathach in https://github.com/hathach/tinyusb/pull/538#issuecomment-709766967

PanRe commented 3 years ago

In fact we completely forgot this point last time we discussed about the reverted FIFO allocation in dcd_synopsis! We only had IN EPs or the TX part in mind...

I do not want to be annoying but IMHO you need to know upfront the required size of the RX FIFO since you can not change it easily once USB transfers are in progress. Either you fix it statically as you proposed above or you parse through the descriptors and find out how much you need. For the latter option we already have tested code available - the reverted part ^^

How about the following idea:

This way, we could realize your desired separation such that dcd_synopsis has nothing to do with the descriptors but only gets the information of how big the EPs are and also other portable drivers could use the EP size report if required. I think this way we also make life easier for the user since he only needs to take care of the descriptors and does not need to look elsewhere. Plus you have all the remaining benefits we discussed last time. What do you think?

hathach commented 3 years ago

@PanRe yeah, actually we don't forgot it, we just don't have time to work on it just yet (as well as bunch of other issues :smiley: ).

  • We shift the parsing of the descriptors from dcd_synobsis into usbd.c (put get_ep_size_report() into usbd.c)
  • We change dcd_alloc_mem_for_conf() to be called with the generated EP size report i.e. dcd_alloc_mem_for_conf(uint8_t rhport, ep_sz_tt_report_t rep)

Thank you for your suggestion, I will definitely need it when looking at the issue later on. I still prefer to solve this at usbd, if we solve it at usbd, we solve it for all.

PanRe commented 3 years ago

I had an idea how to solve this issue! Not an overall solution but a step closer to it. If we would allocate all TX FIFOs not starting directly behind the RX FIFO but from the top (like heap and stack), the common RX buffer can always be extended at any time without problems! This comes in handy in case an ISO OUT EP is opened with a size bigger than 64 bytes (the current standard size of the RX FIFO) because now the RX FIFO can be extended immediately! The size of the RX buffer is thus not required to be known up front.

So instead of allocating the FIFOs like this:

Start-> || --- RX FIFO --> | --- TX-FIFO-0 --> | --- TX-FIFO-1 --> | --- ... --- | --- TX-FIFO-n --> | --- FREE SPACE --- || <-End

do it this way

Start-> || --- RX FIFO --> | --- FREE SPACE --- | --- TX-FIFO-n --> | --- ... --- | --- TX-FIFO-1 --> | --- TX-FIFO-0 -->|| <-End

As you illustrated before, all ISO EPs will be opened latest i.e. their FIFO will be next to the free space. In case only one ISO OUT and ISO IN EP is used, no fragmentation will thus occur. If you have more than one IN ISO EP, you may need defragmentation. But how to do this is an open question anyway.

Advantage of this approach - the DCD does not need to know any EP sizes at initialization time rather only when EPs are opened. So you can keep your desired project structure of not mixing any descriptor stuff into the DCD and you can solve the problem occuring if an OUT ISO EP needs to be opened greater than 64 bytes.

What do you think of this?

hathach commented 3 years ago

Hihi, I am off for a couple of days. Will check it out afterwards :)

hathach commented 3 years ago

@PanRe hihi sorry for late response, I have taken a couple of days off and struggle to catch up with day work. This is a good idea, though I still don't quite understand its advantage. Since ISO are opened last, it is all come down to ISOs endpoints only with sub-space of FIFO. For only-one ISO config, there should be no problem at all with fragmentation.

PS: I really need to work on this asap, it has been on-pending for quite some time :(

PanRe commented 3 years ago

The problem occurs if you want to open an OUT-EP bigger than the standard size of the RX FIFO. This can occur easily since maximum ISO size is 1023 byte for FS or 1024 byte for HS. The standard RX size is set lower than this value. So you need to extend its size at run time or you need to know this before you allocate the RX FIFO size (when a new configuration is set up). Knowing it before is not possible if you do not want to parse the descriptors.

hathach commented 3 years ago

The problem occurs if you want to open an OUT-EP bigger than the standard size of the RX FIFO. This can occur easily since maximum ISO size is 1023 byte for FS or 1024 byte for HS. The standard RX size is set lower than this value. So you need to extend its size at run time or you need to know this before you allocate the RX FIFO size (when a new configuration is set up). Knowing it before is not possible if you do not want to parse the descriptors.

Yeah, you're spot on with out ISO, I am too focus on the IN. This is really a great idea. The fifo allocation scheme is a bit difficult to get one-fit-all scenarios now. Eg there is configuration with only cdc and msc, and application may want to have the most fifo portion for the msc bulk e.g 512 to maximize the throughput. Maybe We should consider to have a hook for application to define itself. Though I haven't got time to get back to focus on this.

Please proceed with your great suggestion if you could manage your time. I will try my best to catch up.

Note: I am off for vacation for the whole week. It may take a bit of time for me to respond, please be patient:)

HiFiPhile commented 3 years ago

We need also take care of DCDs who do automatic allocation, for example SAME70.

It allocate fifo sequentially, however if one EP fifo is freed/allocated, the data of next EP fifos are lost, as their fifo address changes: image

At the moment I don't free the fifo in dcd_edpt_close, so in next dcd_edpt_open call fifo can't become bigger.

It could be nice to allocate the maximum size at beginning.

hathach commented 3 years ago

yeah, I will try to revise the endpoint API, I think we may move to dcd_edpt_iso_open() with dcd_edpt_iso_set_size() instead of closing and reopen. The driver could specifically open the iso with largest packet then set_size to 0 as disabled state and/or set_size() to other if needed.

PanRe commented 3 years ago

Maybe it is not the best approach to skip closing EPs for ISO stuff completely! This would prevent users to reduce the required EP size (by reducing the sample rate or the resolution) dynamically e.g. when some other EP needs to be opened. The UAC2 spec specifically mentions this kind of flexibility. I am quiet aware this is rarely used but right now it is possible to do so.

Can the DCDs of such devices not take care of this special requirements on their own? I admit there are some ideas required how to do this... In the current synopsis for STM32, the fact that ISO EPs are the last ones opened is exploited (all other EPs are opened at initialization time, ISO EPs are opened when host issues a set_interface request). I further placed the IN EPs at the top and OUT EPs at the bottom of the stack (just like the classical heap and stack arrangement). This way, when opening IN and OUT EPs they do not interfere with each other and no gaps emerge. This works as long as only one IN and OUT EP is used... Could something like that apply here too? However, allocating all at the beginning is not possible with the current driver structure! I once tried to do this with the STM32 synopsis and pushed a commit where the DCD had acccess to the descriptors in order to parse for the maximum required space of the individual EPs. It then allocated the stack accordingly, so when EPs were opened or closed nothing had to be changed for the stack allocations. But @hathach preferred the DCD to not be connected with the descriptors. They were supposed to be clearly separated, so we skipped it.

hathach commented 3 years ago

need time to revise this, please be patient :)

mingpepe commented 1 year ago

Is there any update or plan?

PanRe commented 1 year ago

Currently not that i am aware of! Are you interested in extending the driver?

mingpepe commented 1 year ago

I am working on another device which has similar issue. As long as the device needs to dynamically allocate memory for endpoints, the device driver may have this issue.

If we solved it in certain driver, other drivers also need to solve similar issue. For example, for rp2040 current solution is deallocating all the memory until all endpoints closed, which may be an issue for composite device.

Hence, I think we need a simple memory allocator independent of driver. Then each driver can use it to solve this issue.

Briefly described API

void allocator_init(address, length);
int allocate(length);  
void deallocate(address);
PanRe commented 1 year ago

Currently this is a tricky question. The problem is, that the drivers are not informed about how many EPs in total exist for the current configuration and how big they are. This was a design question some time ago and it was decided that the drivers shall not come in contact with the configuration stuff. Hence, the driver must be able to allocate the space solely from the information given by the dcd_edpt_open() function time by time. This results in a fragmentation problem e.g. for the synopsis driver. The SAME70 devices suffers different issues as described above. Since usually only ISO EPs are opened and closed, there was not a big need to change that since not many people were asking for it.

Do you have any ideas how to overcome these problems for one particular MCU so far?

mingpepe commented 1 year ago

I will send a PR later, it's much clear to see the code.

I agree that driver should not know about configuration. As I state previously, driver can tell the allocator how much RAM it has, the just call the allocator to get the buffer, then free the buffer while closing.

mingpepe commented 1 year ago

1683

hathach commented 1 year ago

in general, I don't like dynamic memory since it is error-prone. I plan to add max_iso_size to the dcd_edpt_open for ISO to help with reserving the max buffer without having to re-allocated when switching interface.

PanRe commented 1 year ago

There should be no need to allocate memory dynamically, this functionality is already provided by the FIFOs! Or do i miss here something? Buffering data via FIFOs, for instance, is done in the audio driver under the hood. Furthermore, it seems your solution only aims at rp2040? At least for the dwc2 driver we have the problem of USB hardware FIFO fragmentation... your approach would not solve this issue...

mingpepe commented 1 year ago

@hathach the approach to add max_iso_size to dcd_edpt_open means driver may reserve the memory for particular endpoint? But for UVC class, one interface may have multiple settings for different resolution and frame rate, hence need different packet size. If the high-framerate setting was set first, then switch to low-framerate setting (two setting has different max packet size), the driver may still reserve the bigger one? If so, other endpoints (like vendor device or other class) may not have theoretically much memory to use (some memory is reserved by driver, but actually not used).

mingpepe commented 1 year ago

@PanRe, I have traced the code for dwc2 driver but not test actually, just thought the close endpoint order depends on open endpoint order can be better.

Copy comments from dcd_synopsys.c

// "USB Data FIFOs" section in reference manual
// Peripheral FIFO architecture
//
// --------------- 320 or 1024 ( 1280 or 4096 bytes )
// | IN FIFO 0   |
// --------------- (320 or 1024) - 16
// | IN FIFO 1   |
// --------------- (320 or 1024) - 16 - x
// |   . . . .   |
// --------------- (320 or 1024) - 16 - x - y - ... - z
// | IN FIFO MAX |
// ---------------
// |    FREE     |
// --------------- GRXFSIZ
// | OUT FIFO    |
// | ( Shared )  |
// --------------- 0

Let's see TX FIFO only, the driver use _allocated_fifo_words_tx to track the current memory usage and it is the only information to driver, which cause user can only close the last opened endpoint. The idea is to use the allocator to replace _allocated_fifo_words_tx, hence user can close OUT endpoint in arbitrary order.

hathach commented 1 year ago

If the high-framerate setting was set first, then switch to low-framerate setting (two setting has different max packet size), the driver may still reserve the bigger one? If so, other endpoints (like vendor device or other class) may not have theoretically much memory to use (some memory is reserved by driver, but actually not used).

For now, I think it should reserve highest memory for ISO, since host can switch to highest bandwidth at any time so we don't have to deal with run-time memory issue. It is truth that this will take a portion of memory from other endpoint usage, but we will come back to this later when it becomes a practical issue.

hence user can close OUT endpoint in arbitrary order.

As mentioned above, I don't like dynamic allocator. Can you explain why would you need to close endpoint in arbitrary order ? So far only ISO endpoint needs to change its packet size while in configured.

PanRe commented 1 year ago

Closing of OUT EPs is not an issue since the memory of out EPs is shared for all OUT EPs. It is about the IN EPs (the TX buffers).You can close any IN EP any time, however, the allocated memory of the respective EP closed might be located in the middle of other allocated areas. The problem now is, that all TX buffers still existing would need to be relocated s.t. no gaps occur. However, doing so in the middle of ISO transfers would mean, that you need to wait until the TX buffer to be relocated is empty. That would be possible but it is error prone and somebody would need the time to debug and test it thoroughly.The approach of hathach is more pragmatic and would work for other USB hardware types as well. The problem described above holds for the dwc2 driver but there are many other USB hardware types as well (rp2040, SAM, etc.)The problem of opening and closing EPs only concerns ISO EPs and the problem of dynamic memory allocation only comes up for multiple ISO EPs. There is usually a small fraction of applications concerned with that. We should try using the easy solution first and if there really is somebody who finds this approach unsatisfactory for his application, we should extend it (then we also have somebody who can test everything) ;)

mingpepe commented 1 year ago

For my application, I need to report as a HID device(interrupt ep) and UVC device(iso ep) and use the 2 devices independently. The following flow cause problem: 1) open HID device -> allocate memory for interrupt ep 2) open UVC device -> allocate memory for iso ep 3) close UVC device -> as current rp2040 driver, since there are still eps other than ep0 opened(interrupt ep for HID) no memory will be deallocated.

Then repeat step 2) and 3) can cause out of bus memory.

PanRe commented 1 year ago

Well without knowing too many details of the RP2040 hardware, is it not possible to remember which value next_buffer_ptr had at opening an ISO EP and at closing to reset it to the saved value? This would be a quick fix i know. It would work for only one ISO EP currently. How about starting this way?

mingpepe commented 1 year ago

Only memorize the value next_buffer_ptr for ISO EP is not enough as HID may open and close after UVC. To solve my problem in this way, 1 pointer is needed for each EP to record the buffer address. But the driver may not know how many EP exists but can only prepare the maximum EP number of pointer (16 pointers). This would be a quick fix, but I do not think it is a good way.

PanRe commented 1 year ago

For a single active configuration descriptor, HID will not be closed but only opened once the configuation is setup by tinyUSB. This means in generall, that control, bulk, and interrupt EPs are usually (according to the USB protocol) not to be closed. This only happens if a new configuration descriptor is setup and in this case, all EPs are closed. ISO EPs to the contrary are setup from a command sent from the host. This can only happen once tinyUSB already setup all the remaining EPs. This means, the corresponding buffer location of ISO EPs will always be located after the control, bulk, and interrupt EP buffers. Therefore, remembering next_buffer_ptr only could be enough in case of a single ISO EP. Or take an array of the length of the maximum number of EPs possible and save the ISO buffer locations there. Multiple closing and opening ISO EPs is another part to solve, for which @hathach would like to use the information of how big a buffer needs to be in total, as we already discussed.

hathach commented 1 year ago

thanks @PanRe for detail explanation, indeed, so far for all practical usages, only ISO endpoint changes its size while in configured mode. for all other endpoints CBI (control bulk interrupt). @mingpepe your current issue with open/close ISO when switching alt interface can be resolved in much easier way in rp2040 driver. The upcoming max iso size when opening an iso can definitely help with this.

mingpepe commented 1 year ago

@hathach @PanRe It's indeed much easier. I misunderstand that all endpoints may open and close. Thanks.

HiFiPhile commented 1 year ago

I'm working on the stm32_fsdev driver in order to add ISO and FIFO transfer support. It has the same fifo allocation limitation as dwc2, currently the allocation size is decided when the first time EP is opened, it can't be reopened with a bigger size.

I plan to add max_iso_size to the dcd_edpt_open for ISO to help with reserving the max buffer without having to re-allocated when switching interface.

I think it's a good idea.

PanRe commented 1 year ago

Thats great news! I also think it's a good idea!! Great you work on it!

hathach commented 1 year ago

I'm working on the stm32_fsdev driver in order to add ISO and FIFO transfer support. It has the same fifo allocation limitation as dwc2, currently the allocation size is decided when the first time EP is opened, it can't be reopened with a bigger size.

I plan to add max_iso_size to the dcd_edpt_open for ISO to help with reserving the max buffer without having to re-allocated when switching interface.

I think it's a good idea.

Please do, I think the api is simple enough, but I never get the time to do anything. It should just be just a pair of APIs

bool dcd_edpt_iso_open(rhport, ep_addr, largest_packet_size);
bool dcd_edpt_iso_activate(rhport, desc);

The open() is now used only for reserving space, it should be called within parsing configuration as other endpoint but does not activate/enable endpoint yet. class driver must scan for all alt interface to find out the the largest packet size.

Activate() is called when set interface, activate() with size=0 will deactivate the endpoint (normally alt=0). That is what come up in my head. Feel free to update those on the way. You can use weak or empty implementation for other dcd. I will take care of the whole upgrade later on.

PS: usbd can be a thin/alias to dcd one