OHCI: MMIO coherency/sync issue

Operating System

Windows 11

Board

OHCI

Firmware

class/hid_host/c portable/ohci/ohci.c

What happened ?

After random lengths of time OHCI transfers will not fire the completion interrupt. I traced this to garbage being in hcca.done_head.

The garbage was similar to the address of the TD but was partically corrupt or missing. The root cause was writing to td_head directly here https://github.com/hathach/tinyusb/blob/044f4d1801a24634a519b73dec6f2e1dc326169d/src/portable/ohci/ohci.c#L443

I can see exactly why we are doing this; which to prevent having to allocate dummy TDs for every ED, but the OHCI spec implies you cannot do this.

Section 4.6 of OHCI has this; I have a feeling I am getting the coherency/sync issue described here.

Software queues to the list by using the value of TailP to obtain the physical address of the last
TD queued to the ED. Since the TD pointed to by TailP is not accessed by the HC, the Host
Controller Driver can initialize that TD and link at least one other to it without creating a
coherency or synchronization problem.
Software may not alter in any way any of the
TDs it has queued prior to the one pointed to by TailP until the Host Controller completes
processing of the TD or the Host Controller Driver ensures that queue processing for the ED has
been halted.

I have found two fixes below:

Stop ED while we modify it to queue the transfer, but this is not an ideal solution waiting 1ms every transaction; although maybe not too bad and a sacrifice that can be made to Tiny USB to keep it small. This would be okay for interrupt and bulk ultimately and isochronous can be handled differently (I am currently adding it)
```
+p_ed->skip = 1;
+osal_task_delay(1); // Wait one frame to ensure OHCI is not processing ED
p_ed->td_head.address |= (uint32_t) _phys_addr(p_gtd); // Write it
+p_ed->skip = 0;
```
The 'correct' way and I can confirm it resolved the issue was by creating a dummy TD that is linked to the ED head and tail on hcd_open. Then in hcd_edpt_xfer I populated this TD, allocate a new dummy then update the ED tail to pointer to the new TD which triggers the transfer. This is exactly what the OHCI recomends and the issue no longer occurs. The big problem is ofcourse this basically doubles the memory used by TDs.

I was not able to replicate with control transfers, as I think we use the control list filled bit to trigger the transfer and only every have one control transfer at a time.

How to reproduce ?

Its pretty random and took usually a minute or two to occur, Just i had interrupt transferings firing every millisecond so activitely was high.

https://github.com/hathach/tinyusb/blob/044f4d1801a24634a519b73dec6f2e1dc326169d/src/portable/ohci/ohci.c#L585

This assert picked it up fairly quickly

static ohci_td_item_t* list_reverse(ohci_td_item_t* td_head)
{
  ohci_td_item_t* td_reverse_head = NULL;
  while(td_head != NULL)
  {
      td_head = _virt_addr(td_head);
+     // Check if the td address is atleast within the ohci data struct. If not something is really wrong!
+     TU_ASSERT((uintptr_t)&ohci_data < (uintptr_t)td_head  < ((uintptr_t)(&ohci_data) + sizeof(ohci_data)));
      uint32_t next = td_head->next;

Debug Log as txt file (LOG/CFG_TUSB_DEBUG=2)

Already root caused this problem.

Its possibly it only occurs on certain hardware depending how caching and clocks are handled.

Screenshots

No response

I have checked existing issues, dicussion and documentation

[X] I confirm I have checked existing issues, dicussion and documentation.

hathach / tinyusb