Closed pgreenland closed 1 year ago
you should not need to add sys_check_timeouts
as lwIP in NO_SYS=0 mode has its own timer/task... (maybe not everything is configured correctly tho!?)
have you tried rebooting your AP?
Just to take it out of the equation - can you disable wifi power management?
Something like...
cyw43_wifi_pm(&cyw43_state, CYW43_DEFAULT_PM & ~0xf);
@kilograham Thanks for the fast reply.
Checked again this morning, same problem, a tactical reboot of the AP when nobody was around solved it immediately.
Strange that I've seen this with two separate APs now though. The one I've most recently been having issues with contains a "MediaTek MT7622 802.11bgn", which seems generally well behaved until it met the Pico :-P.
Seems it still requires my little tweak though, here's a boot of the iperf demo without it:
.Starting FreeRTOS on core 0:
dhcp_release_and_stop()
Version: 7.95.49 (2271bb6 CY) CRC: b7a28ef3 Date: Mon 2021-11-29 22:50:27 PST Ucode Ver: 1043.2162 FWID 01-c51d9400
cyw43 loaded ok, mac 28:cd:c1:00:b6:e9
API: 12.2
Data: RaspberryPi.PicoW
Compiler: 1.29.4
ClmImport: 1.47.1
Customization: v5 22/06/24
Creation: 2022-06-24 06:55:08
dhcp_release_and_stop()
dhcp_start(netif=2000102C) w00
dhcp_start(): restarting DHCP configuration
dhcp_start(): starting DHCP configuration
Connecting to WiFi...
connect status: joining
dhcp_discover()
transaction id xid(30958aea)
dhcp_discover: making request
dhcp_discover: sendto(DISCOVER, IP_ADDR_BROADCAST, LWIP_IANA_PORT_DHCP_SERVER)
dhcp_discover: deleting()
dhcp_discover: SELECTING
dhcp_discover(): set request timeout 2000 msecs
connect status: no ip
failed to connect.
Looks like the DHCP discover is a single shot, stop and declare failure.
Here's the output with my hacky patch reapplied:
Starting FreeRTOS on core 0:
dhcp_release_and_stop()
Version: 7.95.49 (2271bb6 CY) CRC: b7a28ef3 Date: Mon 2021-11-29 22:50:27 PST Ucode Ver: 1043.2162 FWID 01-c51d9400
cyw43 loaded ok, mac 28:cd:c1:00:b6:e9
API: 12.2
Data: RaspberryPi.PicoW
Compiler: 1.29.4
ClmImport: 1.47.1
Customization: v5 22/06/24
Creation: 2022-06-24 06:55:08
dhcp_release_and_stop()
dhcp_start(netif=2000102C) w00
dhcp_start(): restarting DHCP configuration
dhcp_start(): starting DHCP configuration
Connecting to WiFi...
connect status: joining
dhcp_discover()
transaction id xid(840a7388)
dhcp_discover: making request
dhcp_discover: sendto(DISCOVER, IP_ADDR_BROADCAST, LWIP_IANA_PORT_DHCP_SERVER)
dhcp_discover: deleting()
dhcp_discover: SELECTING
dhcp_discover(): set request timeout 2000 msecs
connect status: no ip
dhcp_fine_tmr(): request timeout
dhcp_timeout()
dhcp_timeout(): restarting discovery
dhcp_discover()
transaction id xid(840a7388)
dhcp_discover: making request
dhcp_discover: sendto(DISCOVER, IP_ADDR_BROADCAST, LWIP_IANA_PORT_DHCP_SERVER)
dhcp_discover: deleting()
dhcp_discover: SELECTING
dhcp_discover(): set request timeout 4000 msecs
dhcp_recv(pbuf = 2000A068) from DHCP server 192.168.1.254 port 67
pbuf->len = 300
pbuf->tot_len = 300
skipping option 28 in options
searching DHCP_OPTION_MESSAGE_TYPE
DHCP_OFFER received in DHCP_STATE_SELECTING state
dhcp_handle_offer(netif=2000102C) w00
dhcp_handle_offer(): server 0xfe01a8c0
dhcp_handle_offer(): offer for 0xe201a8c0
dhcp_select(netif=2000102C) w00
transaction id xid(840a7388)
dhcp_select: REQUESTING
dhcp_select(): set request timeout 2000 msecs
dhcp_recv(pbuf = 2000A068) from DHCP server 192.168.1.254 port 67
pbuf->len = 300
pbuf->tot_len = 300
skipping option 28 in options
searching DHCP_OPTION_MESSAGE_TYPE
DHCP_ACK received
dhcp_bind(netif=2000102C) w00
dhcp_bind(): t0 renewal timer 86400 secs
dhcp_bind(): set request timeout 86400000 msecs
dhcp_bind(): t1 renewal timer 43200 secs
dhcp_bind(): set request timeout 43200000 msecs
dhcp_bind(): t2 rebind timer 75600 secs
dhcp_bind(): set request timeout 75600000 msecs
dhcp_bind(): IP: 0xe201a8c0 SN: 0x00ffffff GW: 0xfe01a8c0
connect status: link up
Connected.
Ready, running iperf server at 192.168.1.226
blink_task starts
Second DHCP request got a response and the show goes on.
My hack came about from comparing the cyw43_arch_threadsafe_background and cyw43_arch_freertos versions of the file, although having found the definition of sys_check_timeouts, it hints like you said that it should be called from sys_timeouts_mbox_fetch (tcpip_timeouts_mbox_fetch), which in turn is called from tcpip_thread.
I went down the rabbit hole when I realised the timer "cyw43_sleep_timer" is created but never started, calling the function periodic_worker which contains the sys_check_timeouts call in the non-freertos implementation.
Removing the patch and poking around it looks like the tcpip thread gets blocked waiting on the tcpip mutex inside tcpip_timeouts_mbox_fetch, with the main task currently holding it. It stays blocked until after the wifi connection times out and so doesn't run any of the DHCP timers etc.
Adding a few print statements:
.Starting FreeRTOS on core 0:
TCP IP THREAD MAIN LOOP <- Entered the tcpip stack task loop
tcpip_timeouts_mbox_fetch <- called tcpip_timeouts_mbox_fetch
tcpip_timeouts_mbox_fetch sleeptime = 497 <- sleep for 497 ms
dhcp_release_and_stop()
call sys_arch_mbox_fetch <- sys_arch_mbox_fetch called with sleep time
Version: 7.95.49 (2271bb6 CY) CRC: b7a28ef3 Date: Mon 2021-11-29 22:50:27 PST Ucode Ver: 1043.2162 FWID 01-c51d9400
sys_arch_mbox_fetch res = 4294967295 <- returned following timeout
locking tcpip core <- about to call lock before getting stuck with another task holding it, doesn't appear to recover
cyw43 loaded ok, mac 28:cd:c1:00:b6:e9
API: 12.2
Data: RaspberryPi.PicoW
Compiler: 1.29.4
ClmImport: 1.47.1
Customization: v5 22/06/24
Creation: 2022-06-24 06:55:08
dhcp_release_and_stop()
dhcp_start(netif=2000102C) w00
dhcp_start(): restarting DHCP configuration
dhcp_start(): starting DHCP configuration
Connecting to WiFi...
connect status: joining
dhcp_discover()
transaction id xid(cbb7a676)
dhcp_discover: making request
dhcp_discover: sendto(DISCOVER, IP_ADDR_BROADCAST, LWIP_IANA_PORT_DHCP_SERVER)
dhcp_discover: deleting()
dhcp_discover: SELECTING
dhcp_discover(): set request timeout 2000 msecs
connect status: no ip
The same debug with my hack in:
.Starting FreeRTOS on core 0:
TCP IP THREAD MAIN LOOP
tcpip_timeouts_mbox_fetch
tcpip_timeouts_mbox_fetch sleeptime = 497
dhcp_release_and_stop()
call sys_arch_mbox_fetch
Version: 7.95.49 (2271bb6 CY) CRC: b7a28ef3 Date: Mon 2021-11-29 22:50:27 PST Ucode Ver: 1043.2162 FWID 01-c51d9400
sys_arch_mbox_fetch res = 4294967295
locking tcpip core
cyw43 loaded ok, mac 28:cd:c1:00:b6:e9
API: 12.2
Data: RaspberryPi.PicoW
Compiler: 1.29.4
ClmImport: 1.47.1
Customization: v5 22/06/24
Creation: 2022-06-24 06:55:08
dhcp_release_and_stop()
dhcp_start(netif=2000102C) w00
dhcp_start(): restarting DHCP configuration
dhcp_start(): starting DHCP configuration
Connecting to WiFi...
connect status: joining
dhcp_discover()
transaction id xid(f5a9dc39)
dhcp_discover: making request
dhcp_discover: sendto(DISCOVER, IP_ADDR_BROADCAST, LWIP_IANA_PORT_DHCP_SERVER)
dhcp_discover: deleting()
dhcp_discover: SELECTING
dhcp_discover(): set request timeout 2000 msecs
connect status: no ip
dhcp_fine_tmr(): request timeout
dhcp_timeout()
dhcp_timeout(): restarting discovery
dhcp_discover()
transaction id xid(f5a9dc39)
dhcp_discover: making request
dhcp_discover: sendto(DISCOVER, IP_ADDR_BROADCAST, LWIP_IANA_PORT_DHCP_SERVER)
dhcp_discover: deleting()
dhcp_discover: SELECTING
dhcp_discover(): set request timeout 4000 msecs
dhcp_recv(pbuf = 2000A068) from DHCP server 192.168.1.254 port 67
pbuf->len = 300
pbuf->tot_len = 300
skipping option 28 in options
searching DHCP_OPTION_MESSAGE_TYPE
DHCP_OFFER received in DHCP_STATE_SELECTING state
dhcp_handle_offer(netif=2000102C) w00
dhcp_handle_offer(): server 0xfe01a8c0
dhcp_handle_offer(): offer for 0xe201a8c0
dhcp_select(netif=2000102C) w00
transaction id xid(f5a9dc39)
dhcp_select: REQUESTING
dhcp_select(): set request timeout 2000 msecs
dhcp_recv(pbuf = 2000A068) from DHCP server 192.168.1.254 port 67
pbuf->len = 300
pbuf->tot_len = 300
skipping option 28 in options
searching DHCP_OPTION_MESSAGE_TYPE
DHCP_ACK received
dhcp_bind(netif=2000102C) w00
dhcp_bind(): t0 renewal timer 86400 secs
dhcp_bind(): set request timeout 86400000 msecs
dhcp_bind(): t1 renewal timer 43200 secs
dhcp_bind(): set request timeout 43200000 msecs
dhcp_bind(): t2 rebind timer 75600 secs
dhcp_bind(): set request timeout 75600000 msecs
dhcp_bind(): IP: 0xe201a8c0 SN: 0x00ffffff GW: 0xfe01a8c0
connect status: link up
Connected.
Ready, running iperf server at 192.168.1.226
locked tcpip core <- lock acquired by tcpip thread
tcpip_timeouts_mbox_fetch sleeptime = 267
Seems that the lock holding up the timer processing is only acquired after the connection has been completed....doh.
@peterharperuk Gave that power management option a try, adding it immediately after the cyw43_arch_enable_sta_mode(); call in the freertos iperf demo. No change in getting it connected but it has got the ping times down from the usual 200 to 300ms with occasional timeout down to a steady 2ms. I approve this change :-).
It seems the pico goes through three stages of working with this AP. After an AP reboot it's fine for some number of connections. Then for a period it completes DHCP receiving an IP, before stopping responding to packets. It'll do this several times, until finally it fails to complete DHCP at all until the AP is rebooted again. Each time I can can see requests arriving from the pico on the AP's wireless interface, and the replies from the DHCP server supposedly going out on the wifi interface.
Made some progress.
@kilograham Reverted my changes and got closer to why I'm not seeing the DHCP timer and retries etc.
Seems in the iperf example (and my code derived from it), the main task which calls the wifi connection route is a higher priority (2) than the tcpip thread (1) which executes the timers.
Inside cyw43_arch_wifi_connect_until best_effort_wfe_or_timeout is called, waiting on the next event, presumably from cyw43_task which executes the sev instruction.
Either increasing the tcpip task priority above that of the main task, or using cyw43_arch_wifi_connect_async and reproducing the link monitoring code with a vTaskDelay inserted allows the tcpip task some time to run and perform the retries.
Without knowing exactly how the wfe / sve interact, I'd say the tcpip task was being starved of processor time and my previous printf based mutex analysis was garbage.
AP wise, still having issues - rather than restarting the AP I've got away with restarting hostapd on the AP before restarting the Pico, which allows it to connect every time.
Possibly a bug in hostapd? - Logs all look clean, pico appears to handshake fine even if it doesn't receive any packets on occasions where it fails.
ah yes; WFE
should not be being used here on FreeRTOS :-( ... I guess I was mostly testing with SMP which likely avoids the starvation problem... probably worth adding a separate issue for the WFE thing
You can (and may have) just stick a short vTaskDelay in there as a workaround for now
I looked at sticking it in the loop inside cyw43_arch_wifi_connect_until but it's platform independent, so instead pulled out the bits I needed into main and made use of cyw43_arch_wifi_connect_async with a delay in the loop. No SDK changes to track that way too :-P.
In main:
if (cyw43_arch_wifi_connect_async(WIFI_SSID, WIFI_PASSWORD, CYW43_AUTH_WPA2_AES_PSK))
{
printf("failed to request connect.\n");
exit(1);
}
printf("Waiting for connection....\n");
int status = CYW43_LINK_UP + 1;
while(status >= 0 && status != CYW43_LINK_UP) {
int new_status = cyw43_tcpip_link_status(&cyw43_state, CYW43_ITF_STA);
if (new_status != status) {
status = new_status;
printf("connect status: %s\n", status_name(status));
}
vTaskDelay(pdMS_TO_TICKS(100));
}
printf("Connected!.\n");
Shamelessly coping cyw43_tcpip_link_status along side it.
It's not pretty....but it'll do for now.
If you've got an idea of how it should be fixed properly for FreeRTOS use, I'm happy to prepare and submit a PR.
Have raised #917 to cover this issue.
closing this as #917 was fixed in 1.5.0
Hi,
I've just started evaluating the Pico W for use in a future project but have been seeing some very strange things when trying to use the Wi-Fi.
Built the SDK examples without any issues and flashed using another Pico as a programmer / debugger, all going well.
First test with the iperf freertos server example, flashing "picow_freertos_iperf_server_sys.elf" went well. I was able to connect and achieve around 25Mbits/s.
Started working on my own application based on FreeRTOS and a stripped down version of the iPerf server application.
Started to experience the occasional Wi-Fi connection issue, often the first connection after a long period of activity would go well. The pi is able to receive traffic, responding to pings etc.
After a few programming / reboot cycles things start to go wrong, for example DHCP completes but no further traffic is received by the Pi.
It seems as if the module isn't able to receive Ethernet packets anymore, but is able to send them.
I reverted back to the pico examples to try to work out what's going on, using a debug build of the iperf example built with:
When attempting connection to the Wi-Fi network the following is seen on the serial console:
On the access point running OpenWRT, a packet capture on the Wi-Fi interface shows both DHCP requests and responses from the DHCP server, which appear to not reach the Pi
When running my application, on occasions when DHCP is successful, if I try sending simple like a UDP packet, I see the ARP requests to and responses from the remote machine, however the Pi never seems to receive the responses.
Tried a different AP, same result, even tried a different Pi.
Made a single small change to the FreeRTOS implementation as it seems the DHCP timer isn't running, which I believe means means no DHCP request retries (how I spotted the problem) and no periodic refresh of lease.
Ignore the stack size increase, which was added to resolve a stack overflow when enabling as much debug as I could.
Have tried both with and without the patch, no change in failed case.
Any pointers would / things to check would be appreciated.
Thanks,
Phil