Closed rgrr closed 6 months ago
Hello rgrr,
The same thing has been observed on my platform. RNDIS performance is fairly reasonable, But NCM is 10 times slower then RNDIS. My Prime suspect is the way NTB is handled
Also there is a bug in the implementation which I filed sometime ago but still haven't got any response yet.
Hello joshikeval3131,
to be honest, I'm more or less a novice to (Tiny)USB from developer side. Do you have some pointers where and how to start debugging the NCM code? Of course I have found ncm_device.c et al, but even adding some debug output still generates more confusion than clarity.
New evidence for the NCM problem.
I have done some tests with the iperf server provided in the lwIP examples and recorded with Wireshark.
Once I have done the tests with ECM and once with NCM. iperf command line on client side (Linux) is
for MSS in 90 93 100 150 200 255 256 300 400 500 511 512 600 700 800 900 1000 1100 1200 1300 1400 1450 1459 1460 1500; do iperf -c 192.168.14.1 -e -i 1 -l 22 -M $MSS; sleep 10; done
Wiresharks statistics graphs are as follows.
Good case is with ECM, although with MSS=1460/1500 I find it suspicious:
Bad case is with NCM, small MSS are generating a lot of errors within the stream:
For analysis I have attached a sequence with ECM/NCM iperf-90-93.zip
Does anyone has any pointers where to search (ok, I know it is most likely in ncm_device)
Hmmm, does anybody read this?
I want to make a proposal: the ncm_device driver seems to be very buggy, e.g. it operates on incoming frames while the previous one is still processed, also it does no ZLP insertion where it should insert one. These bugs are the cause for the above described hickups.
I have implemented a simple version of the driver, which allows only one ethernet frame on reception/transmission.
This would put the focus on reliability, not so much on performance.
If one of the maintainers agree, I will file a PR on this.
If one wants to check the code, it can be found here: https://github.com/rgrr/yapicoprobe/blob/feature/systemview-ncm-debugging/src/net/tinyusb/ncm_device_simple.c
Current image with ncm_device_simple...
Hmmm, does anybody read this?
I want to make a proposal: the ncm_device driver seems to be very buggy, e.g. it operates on incoming frames while the previous one is still processed, also it does no ZLP insertion where it should insert one. These bugs are the cause for the above described hickups.
I have implemented a simple version of the driver, which allows only one ethernet frame on reception/transmission.
This would put the focus on reliability, not so much on performance.
If one of the maintainers agree, I will file a PR on this.
If one wants to check the code, it can be found here: https://github.com/rgrr/yapicoprobe/blob/feature/systemview-ncm-debugging/src/net/tinyusb/ncm_device_simple.c
Hello rgrr,
Thanks for finding out these bugs I'll test this on my device and let you know.
Hmmm, does anybody read this?
I want to make a proposal: the ncm_device driver seems to be very buggy, e.g. it operates on incoming frames while the previous one is still processed, also it does no ZLP insertion where it should insert one. These bugs are the cause for the above described hickups.
I have implemented a simple version of the driver, which allows only one ethernet frame on reception/transmission.
This would put the focus on reliability, not so much on performance.
If one of the maintainers agree, I will file a PR on this.
If one wants to check the code, it can be found here: https://github.com/rgrr/yapicoprobe/blob/feature/systemview-ncm-debugging/src/net/tinyusb/ncm_device_simple.c
Thanks for this analysis
Note, that there is a pull request #2227 about this. I have done a complete rewrite in the meantime.
rgrr,
Kudos and thanks for writing a very clear concise documentation.
I was able to see that the NCM bandwidth increased by a factor or 2 :smile: .
However I observed that as the segment size kept increasing the bandwidth was decreasing. For segment size > 800 my iperf application crashed.
Currently trying to understand and analyze your changes in the driver.
... However I observed that as the segment size kept increasing the bandwidth was decreasing. For segment size > 800 my iperf application crashed. ...
Do you have some kind of available test application? What is the exact iperf command line?
Yup I am using standard lwip webserver example from TinyUSB and iperf application from lwip-contrib/examples/lwiperf
I am executing following command. for MSS in 100 200 400 800 1200 1450 1500; do iperf -c 192.168.7.1 -e -i 1 -M $MSS -l 8192 -P 1; sleep 2; done
Here are the logs that I am getting.
Iperf logs
WARNING: attempt to set TCP maximum segment size to 100, but got 536
------------------------------------------------------------
Client connecting to 192.168.7.1, TCP port 5001 with pid 848162
Write buffer size: 8.00 KByte
TCP window size: 75.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.7.2 port 55872 connected with 192.168.7.1 port 5001 (ct=1.51 ms)
[ ID] Interval Transfer Bandwidth Write/Err Rtry Cwnd/RTT NetPwr
[ 3] 0.0000-1.0000 sec 2.03 MBytes 17.0 Mbits/sec 260/0 0 2K/640 us 3328.00
[ 3] 1.0000-2.0000 sec 2.01 MBytes 16.8 Mbits/sec 257/0 0 2K/613 us 3434.49
[ 3] 2.0000-3.0000 sec 2.05 MBytes 17.2 Mbits/sec 262/0 0 2K/685 us 3133.29
[ 3] 3.0000-4.0000 sec 1.98 MBytes 16.6 Mbits/sec 253/0 0 2K/650 us 3188.58
[ 3] 4.0000-5.0000 sec 2.04 MBytes 17.1 Mbits/sec 261/0 0 2K/653 us 3274.29
[ 3] 5.0000-6.0000 sec 2.03 MBytes 17.0 Mbits/sec 260/0 0 2K/664 us 3207.71
[ 3] 6.0000-7.0000 sec 1.97 MBytes 16.5 Mbits/sec 252/0 0 2K/598 us 3452.15
[ 3] 7.0000-8.0000 sec 2.04 MBytes 17.1 Mbits/sec 261/0 0 2K/666 us 3210.38
[ 3] 8.0000-9.0000 sec 1.98 MBytes 16.6 Mbits/sec 253/0 0 2K/655 us 3164.24
[ 3] 9.0000-10.0000 sec 2.00 MBytes 16.8 Mbits/sec 256/0 0 2K/695 us 3017.48
[ 3] 0.0000-10.0029 sec 20.1 MBytes 16.9 Mbits/sec 2575/0 0 -1K/651 us 3239.38
WARNING: attempt to set TCP maximum segment size to 200, but got 536
------------------------------------------------------------
Client connecting to 192.168.7.1, TCP port 5001 with pid 848196
Write buffer size: 8.00 KByte
TCP window size: 40.0 KByte (default)
------------------------------------------------------------
[ 3] local 192.168.7.2 port 34966 connected with 192.168.7.1 port 5001 (ct=1.67 ms)
[ ID] Interval Transfer Bandwidth Write/Err Rtry Cwnd/RTT NetPwr
[ 3] 0.0000-1.0000 sec 2.51 MBytes 21.0 Mbits/sec 321/0 0 3K/506 us 5196.90
[ 3] 1.0000-2.0000 sec 2.53 MBytes 21.2 Mbits/sec 324/0 0 3K/458 us 5795.21
[ 3] 2.0000-3.0000 sec 2.38 MBytes 19.9 Mbits/sec 304/0 0 3K/472 us 5276.20
[ 3] 3.0000-4.0000 sec 3.05 MBytes 25.6 Mbits/sec 390/0 0 3K/455 us 7021.71
[ 3] 4.0000-5.0000 sec 1.98 MBytes 16.6 Mbits/sec 253/0 0 3K/448 us 4626.29
[ 3] 5.0000-6.0000 sec 636 KBytes 5.21 Mbits/sec 81/0 0 3K/690 us 943.73
[ 3] 6.0000-7.0000 sec 0.00 Bytes 0.00 bits/sec 0/2 3 0K/657 us 0.00
[ 3] 7.0000-8.0000 sec 0.00 Bytes 0.00 bits/sec 0/2 0 0K/657 us 0.00
[ 3] 8.0000-9.0000 sec 0.00 Bytes 0.00 bits/sec 0/2 1 0K/657 us 0.00
[ 3] 9.0000-10.0000 sec 0.00 Bytes 0.00 bits/sec 0/2 0 0K/657 us 0.00
[ 3] 0.0000-10.4118 sec 13.1 MBytes 10.5 Mbits/sec 1673/8 4 -1K/565 us 2327.65
WARNING: attempt to set TCP maximum segment size to 400, but got 536
connect failed: Operation now in progress
WARNING: attempt to set TCP maximum segment size to 800, but got 536
TinyUSB NCM Driver logs
(II) tud_network_can_xmit: request blocked
(II) tud_network_can_xmit: request blocked
(II) tud_network_can_xmit: request blocked
(II) tud_network_can_xmit: request blocked
(II) tud_network_can_xmit: request blocked
(II) tud_network_can_xmit: request blocked
(II) tud_network_can_xmit: request blocked
(EE) ill nth16 length: 61464
(EE) VALIDATION FAILED. WHAT CAN WE DO IN THIS CASE?
(EE) ill nth16 length: 61464
(EE) VALIDATION FAILED. WHAT CAN WE DO IN THIS CASE?
(EE) ill nth16 length: 61464
(EE) VALIDATION FAILED. WHAT CAN WE DO IN THIS CASE?
(EE) ill nth16 length: 61464
(EE) VALIDATION FAILED. WHAT CAN WE DO IN THIS CASE?
(EE) ill nth16 length: 61464
(EE) VALIDATION FAILED. WHAT CAN WE DO IN THIS CASE?
(EE) ill nth16 length: 61464
(EE) VALIDATION FAILED. WHAT CAN WE DO IN THIS CASE?
(EE) ill nth16 length: 61464
(EE) VALIDATION FAILED. WHAT CAN WE DO IN THIS CASE?
(EE) ill nth16 length: 61464
(EE) VALIDATION FAILED. WHAT CAN WE DO IN THIS CASE?
ooopssss... and the example is running with ECM/RNDIS?
"request blocked" is not bad, but the "VALIDATION FAILED" messages are fatal. Possible reasons that come to my mind:
What I also do not understand is "WARNING: attempt to set TCP maximum segment size to 100, but got 536". That also indicates that something is really wrong.
It is possible, that I have created a dependency of the driver with my glue code to TinyUSB. The suspicious part looks like:
static void context_tinyusb_linkoutput(void *param)
/**
* Put \a xmt_buff into TinyUSB.
*
* Context: TinyUSB
*/
{
if ( !tud_network_can_xmit(xmt_buff_len)) {
//printf("context_tinyusb_linkoutput: sleep\n");
vTaskDelay(pdMS_TO_TICKS(5));
taskDISABLE_INTERRUPTS();
usbd_defer_func(context_tinyusb_linkoutput, NULL, false); // put yourself at end of TinyUSB event queue
taskENABLE_INTERRUPTS();
}
else {
tud_network_xmit(xmt_buff, xmt_buff_len);
}
} // context_tinyusb_linkoutput
static err_t linkoutput_fn(struct netif *netif, struct pbuf *p)
/**
* called by lwIP to transmit data to TinyUSB
*
* Context: lwIP
*/
{
if ( !tud_ready()) {
return ERR_USE;
}
if (xmt_buff_len != 0) {
return ERR_USE;
}
// copy data into temp buffer
xmt_buff_len = pbuf_copy_partial(p, xmt_buff, p->tot_len, 0);
assert(xmt_buff_len < sizeof(xmt_buff));
taskDISABLE_INTERRUPTS();
usbd_defer_func(context_tinyusb_linkoutput, NULL, false);
taskENABLE_INTERRUPTS();
return ERR_OK;
} // linkoutput_fn
This glue code is driving me nuts, because actually there is no legal (and correct) way to make a function call in TinyUSB context. The above is my best bet. But as said: perhaps the first usbd_xfer_func() perhaps introduces something unintended.
I will look deeper into that.
Ok, first check shows, that with the lwip-webserver example it is running as well. Little bit slow, but anyway.
You can find the sources here: https://github.com/rgrr/tinyusb/tree/feature/standard-webserver/examples/device/net_lwip_webserver
rgrr,
I have confirmed that , lwip-webserver is also running fine on my platform.
"WARNING: attempt to set TCP maximum segment size to 100, but got 536". That also indicates that something is really wrong.
I suspect that this definitely indicates some problem. Also it works for smaller segment sizes but as I increase the segment size the bandwidth decreases and later point driver just crashes.
:( I need to thoroughly understand the code changes that you made and the specific problems it is fixing in the original ncm driver. After this only I will be able to better deduce what is causing this problem. Is it the TinyUSB ? or class driver ? Or my device specific dcd.c implementation.
:( I need to thoroughly understand the code changes that you made and the specific problems it is fixing in the original ncm driver. After this only I will be able to better deduce what is causing this problem. Is it the TinyUSB ? or class driver ? Or my device specific dcd.c implementation.
Please consider ncm_device as a rewrite. I have dropped more or less the original and started fresh but used the original as a template for some things.
Problem with the original ncm_device was, that it reused internal buffers already for reception while transmission of its contents to lwIP was still running.
I have taken a reference of glue logic from https://github.com/rgrr/yapicoprobe/blob/feature/minor-cleanup/src/net/net_glue.c
Implemented the same.
As I increased the TinyUSB buffer size, the test case worked for all segment size. :smiley:
iperf command used:
for LEN in 40000 16000 1400 1024 800 200 80 22 8 2 1; do for MSS in 90 93 100 150 200 255 256 300 400 500 511 512 600 700 800 900 1000 1100 1200 1300 1400 1450 1459 1460 1500; do iperf -c 192.168.14.1 -e -i 1 -l $LEN -M $MSS; sleep 2; done; done
Wireshark analysis
Also please find the logs attached.
Observation -:
Note-: I am using the NCM driver in NO OS setting. ncm_iperf_log_file.txt
Actually I have no idea what the bottleneck could be. Have you checked the wireshark log if you see the expected? Perhaps your iperf behaves not as you want.
Your iperf output:
Client connecting to 192.168.7.1, TCP port 5001 with pid 1792756
Write buffer size: 100 Byte
TCP window size: 40.0 KByte (default)
Mine (2.1.9):
Client connecting to 192.168.14.1, TCP port 5001 with pid 18915 (1 flows)
Write buffer size: 40000 Byte
MSS req size 600 bytes (per TCP_MAXSEG)
TOS set to 0x0 (Nagle on)
TCP window size: 16.0 KByte (default)
For debugging purposes: could you please change
INFO_OUT("(II) tud_network_can_xmit: request blocked\n");
so that it is printing. I had the case under windows, that after several days, the driver seemed to be in a state were it did output the above message very often. Don't know what the reason was, perhaps the glue code which actually is not clean in switching to TinyUSB context. Currently I do not know about a way to do this reliably.
Actually I have no idea what the bottleneck could be. Have you checked the wireshark log if you see the expected? Perhaps your iperf behaves not as you want.
Tried with iperf 2.1.9 same result
Also with little clean up https://github.com/hathach/tinyusb/pull/2227 should be okay to merge :+1:
Do you see any other blockers ? And also do you see any scope of improvement upon this ?
Expectations with glue logic: there should be an official call to put an event into the TinyUSB event queue so that one get into the context of TinyUSB. lwIP API offers tcpip_try_callback() or tcpip_callback(). Currently one has to use usbd_defer_func() which is actually TinyUSB private and also does just block USB interrupts.
Clean up: will follow soon.
Currently I cannot see any further improvement. Perhaps wider use of the net_device will open some new wishes/issues.
One point for improvement could be to change the webserver example to use NCM by default.
This would improve compatibility over the several OSes, because NCM is supported by Linux, MacOS (AFAIK), Win11 (AFAIK) and Win10 (with minor driver installation).
With ECM or RNDIS I had no luck under Win10, at least on my machine.
One more thing (out of topic): how about adding iperf to the webserver example to get a reference example for performance measurements?
Out of curiosity: have you ever compared with ECM/RNDIS? And have you done a bidirectional test with e.g.
for MSS in 100 800 1450; do iperf -c 192.168.14.1 -e -i 1 -M $MSS -l 8192 -r; sleep 2; done
?
((to me) surprisingly server-to-client does not work for small packets)
This would improve compatibility over the several OSes, because NCM is supported by Linux, MacOS (AFAIK), Win11 (AFAIK) and Win10 (with minor driver installation).
As far as I know RNDIS is windows standard offering and Linux also supports RNDIS by default. Where as in NCM extra driver installation is required. Thus it is okay to keep RNDIS as default.
: how about adding iperf to the webserver example to get a reference example for performance measurements?
lwip Iperf will be a good to have example. :+1: I think we should add it.
have you ever compared with ECM/RNDIS? And have you done a bidirectional test with e.g. for MSS in 100 800 1450; do iperf -c 192.168.14.1 -e -i 1 -M $MSS -l 8192 -r; sleep 2; done?
No haven't done Bidirectional test. When I tried I showed following error message.
------------------------------------------------------------
Server listening on TCP port 5001 with pid 2735107
Read buffer size: 8.00 KByte (Dist bin width=1.00 KByte)
MSS req size 100 bytes (per TCP_MAXSEG)
TCP window size: 128 KByte (default)
------------------------------------------------------------
ERROR: expected reverse connect did not occur
I suspect that in default lwip iperf example there is no RX logic. Can you confirm the same ? Can you point me towards RX capable iperf example if you have any ?
Thanks
yes, that is a valid point: RNDIS is correctly detected by Win10. But there are some drawbacks: RNDIS changes somehow routing of my Linux desktop (Debian) and also RNDIS works only under Win10 if it is the only class of the USB device. So actually not a generic solution.
Nevertheless as an example RNDIS should do it.
But: those examples aren't just examples. To my understanding they are also some kind of compilation test. So if all examples go for RNDIS, the NCM device will not be compile tested which to my understanding is "bad".
Concerning iperf: I'm observing the same behavior but just for MSS=100. MSS=800,1450 are working. So actually the iperf example is capable of doing transmission. But still there must be somewhere a bug in the iperf example.
I will rework my PRs #2227 and #2242 accordingly.
@Keval-Joshi-TI : "funny" thing, if you limit in lwiperf_tcp_client_send_more() (in lwiperf.c) the txlen_max to 700 (instead of TCP_MSS) then iperf -c 192.168.14.1 -e -i 1 -M 100 -l 8192 -r
succeeds.
Don't know yet, if this indicates a bug in lwiperf, lwIP, TinyUSB or the device driver (but ECM shows the same behavior).
Comparing ECM / NCM shows that NCM is factor 4 better in direction device -> host.
PS: actual problem is, that tcp_output() does not create a lwiperf_tcp_client_sent() callback
@rgrr : I am stuck at more basic problem.
Even with your suggested change int the lwip example DEV-> HOST transfer is not happening in my case :(
------------------------------------------------------------
Client connecting to 192.168.7.1, TCP port 5001 with pid 3025046 (1 flows)
Write buffer size: 8192 Byte
MSS req size 100 bytes (per TCP_MAXSEG)
TOS set to 0x0 (Nagle on)
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
------------------------------------------------------------
Server listening on TCP port 5001 with pid 3025046
Read buffer size: 8.00 KByte (Dist bin width=1.00 KByte)
MSS req size 100 bytes (per TCP_MAXSEG)
TCP window size: 128 KByte (default)
------------------------------------------------------------
[ 1] local 192.168.7.2%usb0 port 57712 connected with 192.168.7.1 port 5001 (sock=3) (icwnd/mss/irtt=0/100/138) (ct=0.20 ms) on 2023-09-05 14:11:30.762 (IST)
[ ID] Interval Transfer Bandwidth Write/Err Rtry Cwnd/RTT(var) NetPwr
[ 1] 0.00-1.00 sec 1.08 MBytes 9.04 Mbits/sec 138/0 0 2K/1021(137) us 1107
[ 1] 1.00-2.00 sec 1.06 MBytes 8.91 Mbits/sec 136/0 0 2K/1040(141) us 1071
[ 1] 2.00-3.00 sec 1.06 MBytes 8.91 Mbits/sec 136/0 0 2K/1033(146) us 1079
[ 1] 3.00-4.00 sec 1.06 MBytes 8.91 Mbits/sec 136/0 0 2K/979(124) us 1138
[ 1] 4.00-5.00 sec 1.07 MBytes 8.98 Mbits/sec 137/0 0 2K/993(204) us 1130
[ 1] 5.00-6.00 sec 1.06 MBytes 8.91 Mbits/sec 136/0 0 2K/995(132) us 1120
[ 1] 6.00-7.00 sec 1.05 MBytes 8.85 Mbits/sec 135/0 0 2K/1003(137) us 1103
[ 1] 7.00-8.00 sec 1.06 MBytes 8.91 Mbits/sec 136/0 0 2K/933(131) us 1194
[ 1] 8.00-9.00 sec 1.06 MBytes 8.91 Mbits/sec 136/0 0 2K/999(136) us 1115
[ 1] 9.00-10.00 sec 1.06 MBytes 8.91 Mbits/sec 136/0 0 2K/1043(139) us 1068
[ 1] 0.00-10.02 sec 10.6 MBytes 8.92 Mbits/sec 1363/0 0 2K/1021(133) us 1092
ERROR: expected reverse connect did not occur
-> Moreover I am trying to run lwip + tinyUSB in freeRTOS i.e LWIP with NO_SYS = 0 where lwip task has lower priority. Somehow lwip shows timeout when I try NCM data transfer. I took your lwipopts for reference from https://github.com/rgrr/yapicoprobe/blob/feature/minor-cleanup/include/lwipopts.h. Somehow this combination is not functional for me as of now.
Quick question on glue logic:
-> The part where you call tcpip_try_callback_block which essentially schedules a function call which puts received ethernet packet into lwip context , looks all okay to me.
-> But the one where you are calling the defer function looks little bit missplaced. In baremetal scenario I tried with and without defer function, its identical for me. i.e the approach taken by default lwip_webserver example.
Can you please why we need to put context_tinyusb_linkoutput call in the tud task event queue ?. Can lwip directly handle it when it calls linkoutput_fn ?
Perhaps our lwIP configuration is different. Essential parts of mine:
#define TCP_MSS (1500 - 20 - 20) // MTU minus header sizes (best value til now)
#define TCP_SND_BUF (8 * TCP_MSS) // good tuning
#define TCP_SND_QUEUELEN 16
#define TCP_SNDQUEUELOWAT (TCP_SND_QUEUELEN / 2)
#define MEMP_NUM_TCP_SEG 32
#define MEM_SIZE 40000
Concerning the glue logic: this is also a little bit historically grown. I struggled around with lwIP, NCM, FreeRTOS and did not get it flying. So I made a lot of testing. One of them was to assure that the NCM device is always called from TinyUSB context. After many tests and attempts to repair NCM, I decided to rewrite it. First approach was ncm_device_simple in my repo which handles only one datagram in each direction. Compatibility even with Win10 was ways better, because with NCM Win10 allows other USB classes in parallel, with RNDIS it does not. Second step was to reimplement NCM device.
Special case in the glue logic is context_tinyusb_linkoutput(): the idea is to put the deferred call at the end of the TinyUSB queue, so not using a simple while loop. But I made a quick test and it seems (for the iperf testcase) that the while loop is also ok:
while ( !tud_network_can_xmit(xmt_buff_len)) {
//printf("context_tinyusb_linkoutput: sleep\n");
vTaskDelay(pdMS_TO_TICKS(5));
}
tud_network_xmit(xmt_buff, xmt_buff_len);
But I did not test if this context switching is actually required. For me this is the cleaner way although the glue logic looks a little bit more cumbersome.
gosh... sometimes it is more complicated... I have to correct myself: the above loop does not work, the one with the usbd_defer_func() is the correct one. And this is also true for ECM/RNDIS: they do fine with the defer loop but not with the simple loop from above. I will investigate this further if there is time, but I guess the simple loop blocks TinyUSB while it actually wants something to do. The problems appears for ECM immediately, for NCM (because of better buffering) for very small "-M" in the iperf invocation.
Concerning lwiperf: the problem with the sender operations lies within the condition of the do-while-loop in lwiperf_tcp_client_send_more(). The condition txlen >= (TCP_MSS / 2)
is too limiting, because for -M 100
the call to tcp_write() does not succeed and thus there is no data in the buffer which stops transmission.
Replacing the above condition with txlen != 0
solves this and lwiperf sends data correctly.
@joshikeval3131: Could you please test this as well?
@rgrr :
Concerning lwiperf: the problem with the sender operations lies within the condition of the do-while-loop in lwiperf_tcp_client_send_more(). The condition txlen >= (TCP_MSS / 2) is too limiting, because for -M 100 the call to tcp_write() does not succeed and thus there is no data in the buffer which stops transmission. Replacing the above condition with txlen != 0 solves this and lwiperf sends data correctly.
Even with above mentioned change rx is not working in my case. :( Getting following error.
ERROR: expected reverse connect did not occur
Let me check my lwip-stack version
Converted my reference application to work with FreeRTOS. & Surprisingly I am getting better results with RNDIS on FreeRTOS compared to baremetal.
At max TX iperf hit 49Mbps bandwidth.
------------------------------------------------------------
Client connecting to 192.168.7.1, TCP port 5001 with pid 3470278 (1 flows)
Write buffer size: 8192 Byte
MSS req size 1450 bytes (per TCP_MAXSEG)
TOS set to 0x0 (Nagle on)
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 1] local 192.168.7.2%usb0 port 60214 connected with 192.168.7.1 port 5001 (sock=3) (icwnd/mss/irtt=14/1450/169) (ct=0.23 ms) on 2023-09-07 16:17:05.693 (IST)
[ ID] Interval Transfer Bandwidth Write/Err Rtry Cwnd/RTT(var) NetPwr
[ 1] 0.00-1.00 sec 5.84 MBytes 49.0 Mbits/sec 747/0 0 14K/956(7) us 6401
[ 1] 1.00-2.00 sec 5.80 MBytes 48.7 Mbits/sec 743/0 0 14K/953(6) us 6387
[ 1] 2.00-3.00 sec 5.78 MBytes 48.5 Mbits/sec 740/0 0 14K/989(74) us 6130
[ 1] 3.00-4.00 sec 5.80 MBytes 48.6 Mbits/sec 742/0 0 14K/953(4) us 6378
[ 1] 4.00-5.00 sec 5.80 MBytes 48.6 Mbits/sec 742/0 0 14K/1013(111) us 6000
[ 1] 5.00-6.00 sec 5.80 MBytes 48.6 Mbits/sec 742/0 0 14K/958(7) us 6345
However the story so far with NCM is not good. With OS migration I can see some performance increase for smaller packet size i.e MSS = 100, 200. For MSS=400 & above the NCM driver crashes. with following debug message.
(II) tud_network_can_xmit: request blocked
@rgrr :
NCM driver worked for me for all segment sizes.
When lwip_task priority == tud_task priority it works well and as expected.
I am trying to get iperf -r working on my platform.
My device is a Raspi Pico (RP2040), i.e. dual-core Cortex-M0. TinyUSB has higher priority than lwIP. I will test, if it matters which process has higher priority (both processes are limited to one core).
lwiperf: I inserted several printf() for debugging. They pointed very fast to the loop in lwiperf_tcp_client_send_more().
What is your current glue code? Especially if you are using TinyUSB I would use mine ;-)
Just since I'm also using the NCM device and didn't experience any problem with my RP2040-based project up to now: @rgrr: You are using an RTOS? I'm going bare-metal. In case you want to do measurements using my setup (you just need to enable iperf but you know how to do that ;): https://github.com/OpenLightingProject/rp2040-dmxsun
@kripton : does it mean that you are using the new NCM device?
It did not do any tests with your project although I like your work. I stopped when "make" asked for npm :-/
Nevertheless your glue code points me into the direction why the simple loop from above did not work: the call to tud_task() was missing.
Corrected version looks like this:
static void context_tinyusb_linkoutput(void *param)
/**
* Put \a xmt_buff into TinyUSB.
*
* Context: TinyUSB
*/
{
while ( !tud_network_can_xmit(xmt_buff_len)) {
printf("context_tinyusb_linkoutput: sleep\n");
vTaskDelay(pdMS_TO_TICKS(1));
tud_task();
}
tud_network_xmit(xmt_buff, xmt_buff_len);
} // context_tinyusb_linkoutput
@kripton : does it mean that you are using the new NCM device?
No, i'm using the original, first version and I did notnface any problems with it.
It did not do any tests with your project although I like your work. I stopped when "make" asked for npm :-/
True, dmxsun is compiling a React web app, Sorry, I know it causes additional effort.
Nevertheless your glue code points me into the direction why the simple loop from above did not work: the call to tud_task() was missing.
So does it solve your problems with the original NCM device or your new one?
@kripton : does it mean that you are using the new NCM device?
No, i'm using the original, first version and I did notnface any problems with it.
You will stumble into problems if the transmitted packets are coming fast and in different sizes. My initial application was SystemView.
Nevertheless your glue code points me into the direction why the simple loop from above did not work: the call to tud_task() was missing.
So does it solve your problems with the original NCM device or your new one?
I'm actually not the one having problems with it ;-) I'm just trying to track and understand the problems @joshikeval3131 has. The simple loop was just a reaction to his complaint about an additional usb_defer call. Unfortunately wrong in the first attempt.
@Keval-Joshi-TI
@rgrr :
NCM driver worked for me for all segment sizes.
When lwip_task priority == tud_task priority it works well and as expected.
Made some tests with priority: for my device tud_task priority > lwip_task priority is mandatory. Otherwise especially target->host transfer collapses.
@rgrr:
Thanks for conformation on this
Iperf Rx is finally functional for me
On my Linux machine due to some firewall issue iperf -r was not working
I am getting good results with the changes you suggested in lwiperf_tcp_client_send_more() i.e ( txlen != 0 ). :D
@kripton:
In my case so far I am getting much better performance with the new driver compare to older one. In bare metal I can see significant difference and in FreeRTOS there is notable difference.
Note -:
In both the case I am using the glue logic which @rgrr has proposed.
TCP- TX & RX bandwidth is around 50Mbps.
This has been closed by #2227
Operating System
Linux
Board
Raspi Pico RP2040
Firmware
My probe project (https://github.com/rgrr/yapicoprobe/tree/feature/systemview-rndis) uses lwIP / TinyUSB to provide a network interface for SystemView (https://www.segger.com/products/development-tools/systemview/).
The probe is running on a Pico (RP2040), the connected target board is an nRF52840. The nRF52840 generates the SystemView events (https://github.com/rgrr/playground/tree/feature/xray/tools/SystemView), the events are collected from the target via RTT by the RP2040. SystemView itself is connected via TCP to the RP2040 and collects the generated data.
For the underlying Ethernet over USB protocol one can select between ECM/RNDIS/NCM in CMakeLists.txt of the probe firmware.
The probe is using the raw lwIP interface.
For performance testing there is the example iperf server contained in the firmware.
What happened ?
if one selects ECM/RNDIS as underlying protocol, the event throughput is around 30000 events/s, if one selects NCM, the event throughput is < 1000 events/s.
It seems that network traffic is stuttering, i.e. data transfer (traced in the sent-event) pauses and then transfer bursts are happening.
Data is fed from the application to lwIP/TinyUSB the same in all three cases.
What is surprising is, that iperf throughput shows almost the same values for all three cases. So I'm wondering what's the difference between the iperf server and my SystemView server.
Is it because SystemView traffic is almost unidirectional and NCM needs somehow a trigger to start transfer?
How to reproduce ?
Complicated... and I'm actually not sure if this is a bug in TinyUSB or a configuration/handling problem on my side.
To reproduce one needs a Pico and a target board. The required firmware/setup is described above.
I'm really willing to dig deep into TinyUSB, so I'm actually looking for some pointers where to go.
Debug Log as txt file (LOG/CFG_TUSB_DEBUG=2)
On the target (nRF52840) the same firmware is running. The generated events lead to a data volume of < 20KByte/s.
ECM case
ECM transfers the data continuously, every ~170ms data transfer is happening.
NCM case
Here are long pauses between data transfer bursts
Screenshots
No response
I have checked existing issues, dicussion and documentation