Closed showier-drastic closed 3 weeks ago
I think the current situation on reception handling is confusing and probably wrong. tud_network_recv_renew
seems to be called twice in a single packet reception, the first time by tud_network_recv_renew_r
which is in turn called by netd_xfer_cb
, and another time by user logic when it finished processing the current packet. This also means recv_try_to_start_new_reception
is called twice in a single reception.
OK, I understand that we may process multiple NCM "subpackets" in one callback, and that's why we need to repeatedly call tud_network_recv_renew
.
So, I propose a solution to this issue to add re-entrance detection logic to tud_network_recv_renew
function.
void tud_network_recv_renew(void) {
TU_LOG_DRV("tud_network_recv_renew()\n");
static bool processing = false;
static bool should_process = false;
should_process = true;
if (processing) {
TU_LOG_DRV("Re-entrant into tud_network_recv_renew, will process later\n");
return;
}
while (should_process) {
should_process = false;
processing = true;
// If the current function is called within recv_transfer_datagram_to_glue_logic,
// should_process will become true, and the loop will run again
// Otherwise the loop will not run again
recv_transfer_datagram_to_glue_logic();
processing = false;
}
recv_try_to_start_new_reception(ncm_interface.rhport);
} // tud_network_recv_renew
Hmmm... not getting the point. Do you think the problem is in the NCM driver or in the glue code?
Perhaps my own glue code could be of help?: https://github.com/rgrr/yapicoprobe/blob/master/src/net/net_glue.c
I think it's in the NCM driver, which cause compatibility issues with older codes that calls tud_network_recv_renew
directly within tud_network_recv_cb
. And this can be fixed by my function above.
ok... understood. Good point. Loop above seems to be too complicated for my eyes. Is something like this working:
void tud_network_recv_renew(void) {
TU_LOG_DRV("tud_network_recv_renew()\n");
static bool should_process = false; // should go into ncm_interface_t
if (should_process) {
TU_LOG_DRV("Re-entrant into tud_network_recv_renew, will process later\n");
return;
}
should_process = true;
while (should_process) {
should_process = false;
// If the current function is called within recv_transfer_datagram_to_glue_logic,
// should_process will become true, and the loop will run again
// Otherwise the loop will not run again
recv_transfer_datagram_to_glue_logic();
}
recv_try_to_start_new_reception(ncm_interface.rhport);
} // tud_network_recv_renew
Or will it end up again in an endless loop?
No, this will not work, because when you call recv_transfer_datagram_to_glue_logic()
, should_process
is false, so the if (should_process)
block won't be executed. Keep in mind that the second call to tud_network_recv_renew
is inside recv_transfer_datagram_to_glue_logic
.
Hmmm... you are right (perhaps I need some time to rethink it ;-))
Perhaps a simple semaphore could do?:
void tud_network_recv_renew(void) {
static bool sema = false;
if ( !sema) {
sema = true;
TU_LOG_DRV("tud_network_recv_renew()\n");
recv_transfer_datagram_to_glue_logic();
sema = false;
}
recv_try_to_start_new_reception(ncm_interface.rhport);
} // tud_network_recv_renew
This will not work when there are 3 ntb
s handled in one call to netd_xfer_cb
.
I think either a loop or a recursion is needed in case tud_network_recv_renew
is called directly within tud_network_recv_cb
. One call to recv_transfer_datagram_to_glue_logic
only process on ntb
. As we need to process multiple ntb
s, it's obvious that multiple calls to recv_transfer_datagram_to_glue_logic
is needed. We don't want recursions (because that might cause stack overflow on MCUs), so this means we need loops.
you are right (again)... perhaps it could be solved by letting recv_transfer_datagram_to_glue_logic() return an indication that some work has been done (or that there is some work still waiting)?
Yes, but I think should_process
in my code is this indication.
got that, but to be honest I'm not happy with two variables for this logic.
Could you please check PR #2713
Operating System
Linux
Board
Custom ESP32 S3
Firmware
My own, but this is not relevant - will explain below
What happened ?
The new USB NCM driver - introduced in #2227 - is causing a stack overflow.
It causes the problem like this:
tud_network_recv_renew will call into recv_transfer_datagram_to_glue_logic.
recv_transfer_datagram_to_glue_logic
will in turn calltud_network_recv_cb
if there's data - and when the cb called, therecv_glue_ntb
is not cleared yet.The problem is that some
tud_network_recv_cb
, like the one in esp_tinyusb, will unconditionally calltud_network_recv_renew
. So the whole thing happens again and again, causing infinite recursion.So, it's either Espressif will need to change their code ( @ESP-Coco ) and documentation added in TinyUSB to prohibit calling
tud_network_recv_renew
directly withintud_network_recv_cb
, or the logic withinrecv_transfer_datagram_to_glue_logic
need to be changed to prevent this infinite recursion.@HiFiPhile @rgrr could I ask for your opinions on this?
How to reproduce ?
If you are not using ESP32, call
tud_network_recv_renew
directly withintud_network_recv_cb
like the ESP code mentioned above.If you are using ESP32 with ESP-IDF, you can try to replace the
src/class/net
folder of thetinyusb
component to the latest master branch of tinyusb.Debug Log as txt file (LOG/CFG_TUSB_DEBUG=2)
1.txt
Screenshots
No response
I have checked existing issues, dicussion and documentation