hathach / tinyusb

An open source cross-platform USB stack for embedded system
https://www.tinyusb.org
MIT License
4.96k stars 1.04k forks source link

check usbnet example with all ports #289

Open hathach opened 4 years ago

hathach commented 4 years ago

Describe the bug RNDIS example PR #287 may not work with all ports, following is port that work

Ports that tested and not work

Ports that haven't tested yet

pigrew commented 4 years ago

It looks like the INITIALIZE_CMPLT request should return 52 bytes? Perhaps Windows doesn't care if the returned data is longer than expected.

The class driver control response code always returns 128 bytes:

https://github.com/hathach/tinyusb/blob/093b1381f2892994c25de7927544882a7e7e9d1c/src/class/net/net_device.c#L67

https://github.com/hathach/tinyusb/blob/093b1381f2892994c25de7927544882a7e7e9d1c/src/class/net/net_device.c#L193-L207

It trying to send exactly 128 bytes, usbd_control should append a zero-length-packet, but isn't. Could the SAMD and NUC automatically append a zero-length packet in hardware/driver, but the USB core code is broken in this regard?

For RNDIS, it looks like the message length can be extracted from byte 4 through 7 of the message itself.

(For the moment, I'm not going to do more debugging, so feel free to write patch(es) for this issue. It'd also be a good idea to do a bit more error checking, such as verifying bRequestCode in the control handlers.)

majbthrd commented 4 years ago

No, not exactly 128 bytes. The intent was to transfer request->wLength, but no more than 128. tu_control_xfer() calls tu_min() to cap all transfers to request->wLength.

pigrew commented 4 years ago

@majbthrd : Based hathach's log, the SETUP packet has a wLength is 1025, so it wouldn't be truncated. My guess is that the host always sets wLength to 1025, so it'll always send 128 bytes (tu_min(1025,128)).

Regardless, we should figure out why the ZLP is not being sent.

majbthrd commented 4 years ago

I've submitted a PR that should address the STM32 RNDIS issue, and I've regression tested it with SAMD21 and NUC126. It may be worth re-trying on the other targets mentioned above.

hathach commented 4 years ago

@pigrew @majbthrd great analysis, I think the host expect the usbd to return ZLP packet to indicate the end of transfer. I think this is a great chance for us to fix this ZLP behavior of usbd. It is not easy to get an reproducible issue for us to work on. This will prevent similar issue down the road. @pigrew already did ZLP response previously for MSP430 port, maybe it is merged there, I will try to look it up.

hathach commented 4 years ago

This commit https://github.com/hathach/tinyusb/commit/c8247f0907040eb86c301e8bff8bb322fc9cb566 fix ZLP issue for nrf52840. For STM32F072, it is little bit difference, the device did response with ZLP correctly, however STALL all request afterwards. (capture screenshot updated in 1st post)

hathach commented 4 years ago

304 and #305 fixed the issue for STM32 FSDev, will need to do more tests with other mcu :)

pigrew commented 4 years ago

STM32F407 still does not work (with the latest master).

After transferring the USB string 4, the following transaction happens:

SOF 976
SOF 977
SETUP ADDR 29 EP 0
DATA0 [ 21 00 00 00 00 00 18 00 ]
ACK
OUT ADDR 29 EP 0
DATA1 [ 02 00 00 00 18 00 00 00 02 00 00   00 01 00 00 00 00 00 00 00 00 40 00 00 ]
NAK
OUT ADDR 29 EP 0
DATA1 [ 02 00 00 00 18 00 00 00 02 00 00   00 01 00 00 00 00 00 00 00 00 40 00 00 ]
ACK
IN ADDR 29 EP 0
NAK
IN ADDR 29 EP 0
NAK

The transaction maintains NAK until Windows gives up.

hathach commented 4 years ago

@pigrew yeah, there is a different issue with stm32f4, I will check it out later when doing more tests with the rest of mcus as well.

majbthrd commented 4 years ago

Just adding to what @pigrew already said...

The problem transfer happens with EP0. A host-to-device class request is handled, and this results in a tud_control_xfer() (./src/class/net/net_device.c:334).

A valid transaction should like this this:

rndisref

and here is what the STM32F4/dcd_synopsys driver appears to be doing:

stm32f4

ADDITIONAL:

The tud_control_xfer() on ./src/class/net/net_device.c:334 succeeds, resulting in a callback to netd_control_complete() on ./src/class/net/net_device.c:223.

That ultimately results in a call to netd_report on ./src/class/net/net_device.c:116, which calls usbd_edpt_xfer() to send a RNDIS notification via EP1 IN. If I comment out the usbd_edpt_xfer() call, the control transfer shown above will succeed and the board will function in Linux. However, Windows will not tolerate this missing notification message (part of of the RNDIS protocol).

majbthrd commented 4 years ago

OK, @duempel, whilst waiting for someone to volunteer to decipher and fix the dcd_synopsys driver, changing the notify EP from EP1 to EP3 causes the driver to work as expected, allowing net_lwip_webserver to run on the STM32F4(07)DISCOVERY.

To do this, change ./src/examples/device/net_lwip_webserver/src/usb_descriptors.c from:

  // Interface number, string index, EP notification address and size, EP data address (out, in) and size.
  TUD_RNDIS_DESCRIPTOR(ITF_NUM_CDC, STRID_INTERFACE, 0x81, 8, EPNUM_CDC, 0x80 | EPNUM_CDC, CFG_TUD_NET_ENDPOINT_SIZE),

to:

  // Interface number, string index, EP notification address and size, EP data address (out, in) and size.
  TUD_RNDIS_DESCRIPTOR(ITF_NUM_CDC, STRID_INTERFACE, 0x83, 8, EPNUM_CDC, 0x80 | EPNUM_CDC, CFG_TUD_NET_ENDPOINT_SIZE),
hathach commented 4 years ago

It is useful to have the stack LOG as well, what is captured on the bus is not necessary what the DCD detect if we believe there is bug within DCD e.g the above setup is not detected and submit to the stack --> this will help to narrow down the issue.

OK, @duempel, whilst waiting for someone to volunteer to decipher and fix the dcd_synopsys driver, changing the notify EP from EP1 to EP3 causes the driver to work as expected, allowing net_lwip_webserver to run on the STM32F4(07)DISCOVERY.

To do this, change ./src/examples/device/net_lwip_webserver/src/usb_descriptors.c from:

  // Interface number, string index, EP notification address and size, EP data address (out, in) and size.
  TUD_RNDIS_DESCRIPTOR(ITF_NUM_CDC, STRID_INTERFACE, 0x81, 8, EPNUM_CDC, 0x80 | EPNUM_CDC, CFG_TUD_NET_ENDPOINT_SIZE),

to:

  // Interface number, string index, EP notification address and size, EP data address (out, in) and size.
  TUD_RNDIS_DESCRIPTOR(ITF_NUM_CDC, STRID_INTERFACE, 0x83, 8, EPNUM_CDC, 0x80 | EPNUM_CDC, CFG_TUD_NET_ENDPOINT_SIZE),

but HOW :confused: :confused:

duempel commented 4 years ago

Thanks @majbthrd this is just a great start into new day. I tested your suggestion and it worked great using my STM32F105.

I need to do some hardware changes to get a logger running. Then I can have a deeper look inside for testing. It's confusing to me that working perfect on EP3 while working horribly bad on EP1.

hathach commented 4 years ago

@duempel I would highly suggest you to use one of the stm32f4 discovery board when trouble shooting and analyzing issue. That is to make sure we are all testing the same hardware and anything you encounter is reproducible by everyone.

majbthrd commented 4 years ago

FYI: another standalone STM32F4(07)DISCO workaround that enables net_lwip_webserver is to use the dcd_int_handler as a poll routine instead of an interrupt handler:

In ./src/portable/st/synopsys/dcd_synopsis.c, comment out the interrupt enable:

void dcd_int_enable (uint8_t rhport)
{
  (void) rhport;
//  NVIC_EnableIRQ(OTG_FS_IRQn);
}

In ./src/device/usbd.c, add a call to dcd_int_handler():

void tud_task (void)
{
  dcd_int_handler(0);
  // Skip if stack is not initialized

et voila, everything works.

majbthrd commented 4 years ago

That this modification causes things to work with the STM32F4(07)DISCO is bizarre.

By adding this to transmit_packet() in ./src/portable/st/synopsys/dcd_synopsis.c:

if (1==fifo_num)
{
  (* tx_fifo) = 1;
  (* tx_fifo) = 0;
  return;
}

to run prior to the normal copy to TX FIFO routine, it works under Linux. The same EP1 transaction also works in Windows, but things go off the rails later on with a EP2 transfer.

The exact same values are being written by the normal routine; they just happen a bit faster in the above.

Without the above modification, the EP1 TXFE sub-interrupt just fires over and over and over again.

hathach commented 4 years ago

OK, @duempel, whilst waiting for someone to volunteer to decipher and fix the dcd_synopsys driver, changing the notify EP from EP1 to EP3 causes the driver to work as expected, allowing net_lwip_webserver to run on the STM32F4(07)DISCOVERY.

To do this, change ./src/examples/device/net_lwip_webserver/src/usb_descriptors.c from:

  // Interface number, string index, EP notification address and size, EP data address (out, in) and size.
  TUD_RNDIS_DESCRIPTOR(ITF_NUM_CDC, STRID_INTERFACE, 0x81, 8, EPNUM_CDC, 0x80 | EPNUM_CDC, CFG_TUD_NET_ENDPOINT_SIZE),

to:

  // Interface number, string index, EP notification address and size, EP data address (out, in) and size.
  TUD_RNDIS_DESCRIPTOR(ITF_NUM_CDC, STRID_INTERFACE, 0x83, 8, EPNUM_CDC, 0x80 | EPNUM_CDC, CFG_TUD_NET_ENDPOINT_SIZE),

just tested the changing EP1 to EP3 and it works as you said. It is probably a bitmask mis-calculation or something. Checking it now, hmmm.

hathach commented 4 years ago

PR #380 should fix the issue with stm32f4, there is a detail explanation for the root cause there. Though it doesn't explain why changing from EP1 IN to EP3 IN could help with the issue though. Probably a hw race condition or something.

majbthrd commented 4 years ago

@hathach, well done! I've done a quick check with net_lwip_webserver operation using the STM32F4(07)DISCO with Linux, Windows, and macOS, and everything seems functional.

majbthrd commented 4 years ago

@hathach, I just noticed that NUC505 is listed at the top of this thread as having not been tested yet. I've been using it locally for a month or two, so I believe it can be put in the working list.

hathach commented 4 years ago

@hathach, I just noticed that NUC505 is listed at the top of this thread as having not been tested yet. I've been using it locally for a month or two, so I believe it can be put in the working list.

ah thanks, I have updated the 1st post. NUC505 is too painful to flash, I already forgot how the dip switches work :sweat_smile: This lwip_webserver so far is really useful to spot issues on several dcds so far.

duempel commented 4 years ago

@duempel I would highly suggest you to use one of the stm32f4 discovery board when trouble shooting and analyzing issue. That is to make sure we are all testing the same hardware and anything you encounter is reproducible by everyone.

Yes, I absolutely agree with you. It's just that I don't have access to any of the designated boards right now. Of course I will do my best not to arbitrarily publish any issues caused by a particular hardware configuration. Since we know that usbnet's problems were related to the Synopsys driver, it is important information to also see that the different workarounds lead to the same system behavior.

PR #380 should fix the issue with stm32f4, there is a detail explanation for the root cause there. Though it doesn't explain why changing from EP1 IN to EP3 IN could help with the issue though. Probably a hw race condition or something.

Thanks @hathach , you have done a great job on this PR. It's working fine here.

majbthrd commented 3 years ago

With #531, the iMX RT1010 now works, and I suspect the same minor tweak to the stack size will remedy others.

hathach commented 3 years ago

thanks @majbthrd I have updated the 1st post. I will pull out other boards to test with in the weekend.

hathach commented 3 years ago

@majbthrd i just pull-out the rt1020-evk and rt1060-evk to test with

Would you mind testing your 1010 with Linux machine, since Linux use ECM and Windows use RNDIS, maybe there is a bit of issue somewhere. I am short on time to troubleshoot this for now, but if you could confirm the issue, I will update the 1st post that will make it easier to address this later on.

majbthrd commented 3 years ago

@hathach , that's so strange, as I used Linux to test the RT1010 port. This weekend, I'll take a more in depth look.

hathach commented 3 years ago

@hathach , that's so strange, as I used Linux to test the RT1010 port. This weekend, I'll take a more in depth look.

That is weird 🤔🤔

majbthrd commented 3 years ago

@hathach, I think the stack size change that I suggested was unnecessary. I measured peak stack usage at 684 bytes, and that is below the original, default of 1024 bytes. (In turns out my initial problems with running TinyUSB on the IMXRT1010-EVK had a lot more to do with the flaky built-in DAP on the evaluation board.)

I tested the IMXRT1010-EVK using net_lwip_webserver with the original stack size of 0x400 with Windows, Linux, and MacOS. I saw no issue.

@hathach, your mention of ECM under Linux and Windows under VM makes me wonder if there were not other factors involved. As net_lwip_webserver is presently written, Linux uses RNDIS (not ECM) because RNDIS is the default configuration.

What makes net_lwip_webserver more demanding than the other device examples is that it has multiple configurations. The default (1st config) is RNDIS and the 2nd config is ECM.

As you know, TinyUSB does not support dynamically switching between multiple configurations. TinyUSB will only work if the host chooses a non-zero configuration once. Looking back on past discussions (i.e. #345 and #340), it was decided that adding dynamic switching would depend on future work on adding dcd_edpt_close() to all the drivers.

Sure enough, any attempt to dynamically switch between configurations (using usb_modeswitch in Linux) causes problems (USB traffic stops after a USB SET_CONFIG 0 then SET_CONFIG 2), but I confirmed this is true of any TinyUSB target, not just IMXRT1010.

If there is doubt as to whether ECM works in Linux, you could confirm this, if you wanted to, by temporarily swapping around the CONFIG_ID_RNDIS and CONFIG_ID_ECM values in ./examples/device/net_lwip_webserver/src/usb_descriptors.c This would cause ECM to be selected by default.

hathach commented 3 years ago

@hathach, I think the stack size change that I suggested was unnecessary. I measured peak stack usage at 684 bytes, and that is below the original, default of 1024 bytes. (In turns out my initial problems with running TinyUSB on the IMXRT1010-EVK had a lot more to do with the flaky built-in DAP on the evaluation board.)

Ah right, daplink is bit slow partly since it is external qspi flash, I have updated the flash-pyocd target, it should be easier to flash.

I tested the IMXRT1010-EVK using net_lwip_webserver with the original stack size of 0x400 with Windows, Linux, and MacOS. I saw no issue.

Ooops, I will try to do more test, maybe it is some OS setting, btw which Linux distro you are testing with, I am on ubuntu 20.04

@hathach, your mention of ECM under Linux and Windows under VM makes me wonder if there were not other factors involved. As net_lwip_webserver is presently written, Linux uses RNDIS (not ECM) because RNDIS is the default configuration.

Ah right, I kind of forgot, which Linux pick, I mixed up, macOS is the one that pick ECM :)

If there is doubt as to whether ECM works in Linux, you could confirm this, if you wanted to, by temporarily swapping around the CONFIG_ID_RNDIS and CONFIG_ID_ECM values in ./examples/device/net_lwip_webserver/src/usb_descriptors.c This would cause ECM to be selected by default.

I have no doubt on the ECM driver, it works with other MCUs just fine, probably some issue with dcd driver of rt10xx though, since you test it successfully. I will look at other angle, btw I don't have the rt1010 evk, only has has other 1020-1064. So my test is probably not exact. Thank you very much for testing it again, I will carry more testing with my local machine more.

majbthrd commented 3 years ago

Ooops, I will try to do more test, maybe it is some OS setting, btw which Linux distro you are testing with, I am on ubuntu 20.04

I'm testing against: Ubuntu 18.04, Ubuntu 14.04, Windows 7, Windows 10, macOS High Sierra 10.13.16

I'm compiling with: gcc 9.2.1 (ARM 2019-q4)

Thanks for all your hard work, @hathach !

hathach commented 3 years ago

Ooops, I will try to do more test, maybe it is some OS setting, btw which Linux distro you are testing with, I am on ubuntu 20.04

I'm testing against: Ubuntu 18.04, Ubuntu 14.04, Windows 7, Windows 10, macOS High Sierra 10.13.16

I'm compiling with: gcc 9.2.1 (ARM 2019-q4)

Thanks for all your hard work, @hathach !

thanks for your os info. I have updated the 1st post, I will test it out later on. Have to switch on other works. But I think it is probably something minor :)