raspberrypi / pico-sdk

BSD 3-Clause "New" or "Revised" License
3.69k stars 918 forks source link

Pico W strange issue sending but not receiving network packets #915

Closed pgreenland closed 1 year ago

pgreenland commented 2 years ago

Hi,

I've just started evaluating the Pico W for use in a future project but have been seeing some very strange things when trying to use the Wi-Fi.

Built the SDK examples without any issues and flashed using another Pico as a programmer / debugger, all going well.

First test with the iperf freertos server example, flashing "picow_freertos_iperf_server_sys.elf" went well. I was able to connect and achieve around 25Mbits/s.

Started working on my own application based on FreeRTOS and a stripped down version of the iPerf server application.

Started to experience the occasional Wi-Fi connection issue, often the first connection after a long period of activity would go well. The pi is able to receive traffic, responding to pings etc.

After a few programming / reboot cycles things start to go wrong, for example DHCP completes but no further traffic is received by the Pi.

It seems as if the module isn't able to receive Ethernet packets anymore, but is able to send them.

I reverted back to the pico examples to try to work out what's going on, using a debug build of the iperf example built with:

cd pico-examples
mkdir build
cd build
export PICO_TOOLCHAIN_PATH=/Applications/ARM/bin
export PICO_SDK_PATH=/Users/phil/Repos/pico/pico-sdk
export PICO_BOARD=pico_w
export FREERTOS_KERNEL_PATH=/Users/phil/Repos/pico/FreeRTOS-Kernel
cmake -DWIFI_SSID="Redacted" -DWIFI_PASSWORD="Redacted" -DCMAKE_BUILD_TYPE=Debug ..
cd pico_w/freertos/iperf
make -j8

When attempting connection to the Wi-Fi network the following is seen on the serial console:

Starting FreeRTOS on core 0:
Version: 7.95.49 (2271bb6 CY) CRC: b7a28ef3 Date: Mon 2021-11-29 22:50:27 PST Ucode Ver: 1043.2162 FWID 01-c51d9400
cyw43 loaded ok, mac 28:cd:c1:00:b6:e9
API: 12.2
Data: RaspberryPi.PicoW
Compiler: 1.29.4
ClmImport: 1.47.1
Customization: v5 22/06/24
Creation: 2022-06-24 06:55:08
Connecting to WiFi...
connect status: joining
connect status: no ip
failed to connect.

On the access point running OpenWRT, a packet capture on the Wi-Fi interface shows both DHCP requests and responses from the DHCP server, which appear to not reach the Pi

tcpdump: listening on wlan0, link-type EN10MB (Ethernet), capture size 262144 bytes
22:33:53.349795 28:cd:c1:00:b6:e9 > ff:ff:ff:ff:ff:ff Null Unnumbered, xid, Flags [Response], length 6: 01 00
22:33:53.349928 28:cd:c1:00:b6:e9 > ff:ff:ff:ff:ff:ff Null Unnumbered, xid, Flags [Response], length 6: 01 00
22:36:29.239763 28:cd:c1:00:b6:e9 > ff:ff:ff:ff:ff:ff Null Unnumbered, xid, Flags [Response], length 6: 01 00
22:36:29.239898 28:cd:c1:00:b6:e9 > ff:ff:ff:ff:ff:ff Null Unnumbered, xid, Flags [Response], length 6: 01 00
22:36:30.865037 IP (tos 0x0, ttl 255, id 1, offset 0, flags [none], proto UDP (17), length 336)
    0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 28:cd:c1:00:b6:e9, length 308, xid 0x6054778a, Flags [none] (0x0000)
      Client-Ethernet-Address 28:cd:c1:00:b6:e9
      Vendor-rfc1048 Extensions
        Magic Cookie 0x63825363
        DHCP-Message Option 53, length 1: Discover
        MSZ Option 57, length 2: 1500
        Parameter-Request Option 55, length 4: 
          Subnet-Mask, Default-Gateway, BR, Domain-Name-Server
22:36:30.865116 IP (tos 0x0, ttl 255, id 1, offset 0, flags [none], proto UDP (17), length 336)
    0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 28:cd:c1:00:b6:e9, length 308, xid 0x6054778a, Flags [none] (0x0000)
      Client-Ethernet-Address 28:cd:c1:00:b6:e9
      Vendor-rfc1048 Extensions
        Magic Cookie 0x63825363
        DHCP-Message Option 53, length 1: Discover
        MSZ Option 57, length 2: 1500
        Parameter-Request Option 55, length 4: 
          Subnet-Mask, Default-Gateway, BR, Domain-Name-Server
22:36:30.867062 IP (tos 0xc0, ttl 64, id 17021, offset 0, flags [none], proto UDP (17), length 328)
    192.168.1.254.67 > 192.168.1.226.68: [udp sum ok] BOOTP/DHCP, Reply, length 300, xid 0x6054778a, Flags [none] (0x0000)
      Your-IP 192.168.1.226
      Server-IP 192.168.1.254
      Client-Ethernet-Address 28:cd:c1:00:b6:e9
      Vendor-rfc1048 Extensions
        Magic Cookie 0x63825363
        DHCP-Message Option 53, length 1: Offer
        Server-ID Option 54, length 4: 192.168.1.254
        Lease-Time Option 51, length 4: 86400
        RN Option 58, length 4: 43200
        RB Option 59, length 4: 75600
        Subnet-Mask Option 1, length 4: 255.255.255.0
        BR Option 28, length 4: 192.168.1.255
        Domain-Name-Server Option 6, length 4: 192.168.1.254
        Default-Gateway Option 3, length 4: 192.168.1.254
22:36:34.860690 IP (tos 0x0, ttl 255, id 2, offset 0, flags [none], proto UDP (17), length 336)
    0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 28:cd:c1:00:b6:e9, length 308, xid 0x6054778a, Flags [none] (0x0000)
      Client-Ethernet-Address 28:cd:c1:00:b6:e9
      Vendor-rfc1048 Extensions
        Magic Cookie 0x63825363
        DHCP-Message Option 53, length 1: Discover
        MSZ Option 57, length 2: 1500
tcpdump: listening on wlan0, link-type EN10MB (Ethernet), capture size 262144 bytes
22:37:48.349776 28:cd:c1:00:b6:e9 > ff:ff:ff:ff:ff:ff Null Unnumbered, xid, Flags [Response], length 6: 01 00
22:37:48.349915 28:cd:c1:00:b6:e9 > ff:ff:ff:ff:ff:ff Null Unnumbered, xid, Flags [Response], length 6: 01 00
22:37:49.937205 IP (tos 0x0, ttl 255, id 1, offset 0, flags [none], proto UDP (17), length 336)
    0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 28:cd:c1:00:b6:e9, length 308, xid 0x2f9dab00, Flags [none] (0x0000)
      Client-Ethernet-Address 28:cd:c1:00:b6:e9
      Vendor-rfc1048 Extensions
        Magic Cookie 0x63825363
        DHCP-Message Option 53, length 1: Discover
        MSZ Option 57, length 2: 1500
        Parameter-Request Option 55, length 4: 
          Subnet-Mask, Default-Gateway, BR, Domain-Name-Server
22:37:49.937275 IP (tos 0x0, ttl 255, id 1, offset 0, flags [none], proto UDP (17), length 336)
    0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 28:cd:c1:00:b6:e9, length 308, xid 0x2f9dab00, Flags [none] (0x0000)
      Client-Ethernet-Address 28:cd:c1:00:b6:e9
      Vendor-rfc1048 Extensions
        Magic Cookie 0x63825363
        DHCP-Message Option 53, length 1: Discover
        MSZ Option 57, length 2: 1500
        Parameter-Request Option 55, length 4: 
          Subnet-Mask, Default-Gateway, BR, Domain-Name-Server
22:37:49.939107 IP (tos 0xc0, ttl 64, id 17509, offset 0, flags [none], proto UDP (17), length 328)
    192.168.1.254.67 > 192.168.1.226.68: [udp sum ok] BOOTP/DHCP, Reply, length 300, xid 0x2f9dab00, Flags [none] (0x0000)
      Your-IP 192.168.1.226
      Server-IP 192.168.1.254
      Client-Ethernet-Address 28:cd:c1:00:b6:e9
      Vendor-rfc1048 Extensions
        Magic Cookie 0x63825363
        DHCP-Message Option 53, length 1: Offer
        Server-ID Option 54, length 4: 192.168.1.254
        Lease-Time Option 51, length 4: 86400
        RN Option 58, length 4: 43200
        RB Option 59, length 4: 75600
        Subnet-Mask Option 1, length 4: 255.255.255.0
        BR Option 28, length 4: 192.168.1.255
        Domain-Name-Server Option 6, length 4: 192.168.1.254
        Default-Gateway Option 3, length 4: 192.168.1.254
22:37:49.939306 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.1.226 tell 192.168.1.254, length 46
22:37:50.941206 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.1.226 tell 192.168.1.254, length 46
22:37:51.943249 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.1.226 tell 192.168.1.254, length 46
22:37:53.937335 IP (tos 0x0, ttl 255, id 2, offset 0, flags [none], proto UDP (17), length 336)
    0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 28:cd:c1:00:b6:e9, length 308, xid 0x2f9dab00, Flags [none] (0x0000)
      Client-Ethernet-Address 28:cd:c1:00:b6:e9
      Vendor-rfc1048 Extensions
        Magic Cookie 0x63825363
        DHCP-Message Option 53, length 1: Discover
        MSZ Option 57, length 2: 1500
        Parameter-Request Option 55, length 4: 
          Subnet-Mask, Default-Gateway, BR, Domain-Name-Server
22:37:53.937390 IP (tos 0x0, ttl 255, id 2, offset 0, flags [none], proto UDP (17), length 336)
    0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 28:cd:c1:00:b6:e9, length 308, xid 0x2f9dab00, Flags [none] (0x0000)
      Client-Ethernet-Address 28:cd:c1:00:b6:e9
      Vendor-rfc1048 Extensions
        Magic Cookie 0x63825363
        DHCP-Message Option 53, length 1: Discover
        MSZ Option 57, length 2: 1500
        Parameter-Request Option 55, length 4: 
          Subnet-Mask, Default-Gateway, BR, Domain-Name-Server
22:37:53.940396 IP (tos 0xc0, ttl 64, id 18562, offset 0, flags [none], proto UDP (17), length 328)
    192.168.1.254.67 > 192.168.1.226.68: [udp sum ok] BOOTP/DHCP, Reply, length 300, xid 0x2f9dab00, Flags [none] (0x0000)
      Your-IP 192.168.1.226
      Server-IP 192.168.1.254
      Client-Ethernet-Address 28:cd:c1:00:b6:e9
      Vendor-rfc1048 Extensions
        Magic Cookie 0x63825363
        DHCP-Message Option 53, length 1: Offer
        Server-ID Option 54, length 4: 192.168.1.254
        Lease-Time Option 51, length 4: 86400
        RN Option 58, length 4: 43200
        RB Option 59, length 4: 75600
        Subnet-Mask Option 1, length 4: 255.255.255.0
        BR Option 28, length 4: 192.168.1.255
        Domain-Name-Server Option 6, length 4: 192.168.1.254
        Default-Gateway Option 3, length 4: 192.168.1.254
22:37:53.940590 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.1.226 tell 192.168.1.254, length 46
22:37:54.943490 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.1.226 tell 192.168.1.254, length 46
22:37:55.945474 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.1.226 tell 192.168.1.254, length 46
22:38:01.936588 IP (tos 0x0, ttl 255, id 3, offset 0, flags [none], proto UDP (17), length 336)
    0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 28:cd:c1:00:b6:e9, length 308, xid 0x2f9dab00, Flags [none] (0x0000)
      Client-Ethernet-Address 28:cd:c1:00:b6:e9
      Vendor-rfc1048 Extensions
        Magic Cookie 0x63825363
        DHCP-Message Option 53, length 1: Discover
        MSZ Option 57, length 2: 1500
        Parameter-Request Option 55, length 4: 
          Subnet-Mask, Default-Gateway, BR, Domain-Name-Server
22:38:01.936653 IP (tos 0x0, ttl 255, id 3, offset 0, flags [none], proto UDP (17), length 336)
    0.0.0.0.68 > 255.255.255.255.67: [udp sum ok] BOOTP/DHCP, Request from 28:cd:c1:00:b6:e9, length 308, xid 0x2f9dab00, Flags [none] (0x0000)
      Client-Ethernet-Address 28:cd:c1:00:b6:e9
      Vendor-rfc1048 Extensions
        Magic Cookie 0x63825363
        DHCP-Message Option 53, length 1: Discover
        MSZ Option 57, length 2: 1500
        Parameter-Request Option 55, length 4: 
          Subnet-Mask, Default-Gateway, BR, Domain-Name-Server
22:38:01.939666 IP (tos 0xc0, ttl 64, id 19252, offset 0, flags [none], proto UDP (17), length 328)
    192.168.1.254.67 > 192.168.1.226.68: [udp sum ok] BOOTP/DHCP, Reply, length 300, xid 0x2f9dab00, Flags [none] (0x0000)
      Your-IP 192.168.1.226
      Server-IP 192.168.1.254
      Client-Ethernet-Address 28:cd:c1:00:b6:e9
      Vendor-rfc1048 Extensions
        Magic Cookie 0x63825363
        DHCP-Message Option 53, length 1: Offer
        Server-ID Option 54, length 4: 192.168.1.254
        Lease-Time Option 51, length 4: 86400
        RN Option 58, length 4: 43200
        RB Option 59, length 4: 75600
        Subnet-Mask Option 1, length 4: 255.255.255.0
        BR Option 28, length 4: 192.168.1.255
        Domain-Name-Server Option 6, length 4: 192.168.1.254
        Default-Gateway Option 3, length 4: 192.168.1.254
^C
17 packets captured
17 packets received by filter
0 packets dropped by kernel

When running my application, on occasions when DHCP is successful, if I try sending simple like a UDP packet, I see the ARP requests to and responses from the remote machine, however the Pi never seems to receive the responses.

Tried a different AP, same result, even tried a different Pi.

Made a single small change to the FreeRTOS implementation as it seems the DHCP timer isn't running, which I believe means means no DHCP request retries (how I spotted the problem) and no periodic refresh of lease.

diff --git a/src/rp2_common/pico_cyw43_arch/cyw43_arch_freertos.c b/src/rp2_common/pico_cyw43_arch/cyw43_arch_freertos.c
index 88dde6e..84b3d99 100644
--- a/src/rp2_common/pico_cyw43_arch/cyw43_arch_freertos.c
+++ b/src/rp2_common/pico_cyw43_arch/cyw43_arch_freertos.c
@@ -61,6 +61,7 @@ static void periodic_worker(__unused TimerHandle_t handle)
 #endif

     CYW43_STAT_INC(LWIP_RUN_COUNT);
+    sys_check_timeouts();
     if (cyw43_poll) {
         if (cyw43_sleep > 0) {
             if (--cyw43_sleep == 0) {
@@ -137,12 +138,14 @@ int cyw43_arch_init(void) {
         return PICO_ERROR_GENERIC;
     }

+    xTimerStart(timer_handle, portMAX_DELAY);
+
     gpio_add_raw_irq_handler_with_order_priority(IO_IRQ_BANK0, gpio_irq_handler, CYW43_GPIO_IRQ_HANDLER_PRIORITY);
     gpio_set_irq_enabled(CYW43_PIN_WL_HOST_WAKE, GPIO_IRQ_LEVEL_HIGH, true);
     irq_set_enabled(IO_IRQ_BANK0, true);

     cyw43_task_should_exit = false;
-    xTaskCreate(cyw43_task, "CYW43 Worker", configMINIMAL_STACK_SIZE, NULL, CYW43_TASK_PRIORITY, &cyw43_task_handle);
+    xTaskCreate(cyw43_task, "CYW43 Worker", configMINIMAL_STACK_SIZE * 10, NULL, CYW43_TASK_PRIORITY, &cyw43_task_handle);
 #if configUSE_CORE_AFFINITY && configNUM_CORES > 1
     // the cyw43 task mus tbe on the same core so it can restore IRQs
     vTaskCoreAffinitySet(cyw43_task_handle, 1 << portGET_CORE_ID());
@@ -248,4 +251,4 @@ void cyw43_delay_us(uint32_t us) {
 void cyw43_arch_poll() {
 }

-#endif
\ No newline at end of file
+#endif

Ignore the stack size increase, which was added to resolve a stack overflow when enabling as much debug as I could.

Have tried both with and without the patch, no change in failed case.

Any pointers would / things to check would be appreciated.

Thanks,

Phil

kilograham commented 2 years ago

you should not need to add sys_check_timeouts as lwIP in NO_SYS=0 mode has its own timer/task... (maybe not everything is configured correctly tho!?)

have you tried rebooting your AP?

peterharperuk commented 2 years ago

Just to take it out of the equation - can you disable wifi power management?

Something like...

cyw43_wifi_pm(&cyw43_state, CYW43_DEFAULT_PM & ~0xf);

pgreenland commented 2 years ago

@kilograham Thanks for the fast reply.

Checked again this morning, same problem, a tactical reboot of the AP when nobody was around solved it immediately.

Strange that I've seen this with two separate APs now though. The one I've most recently been having issues with contains a "MediaTek MT7622 802.11bgn", which seems generally well behaved until it met the Pico :-P.

Seems it still requires my little tweak though, here's a boot of the iperf demo without it:

.Starting FreeRTOS on core 0:
dhcp_release_and_stop()
Version: 7.95.49 (2271bb6 CY) CRC: b7a28ef3 Date: Mon 2021-11-29 22:50:27 PST Ucode Ver: 1043.2162 FWID 01-c51d9400
cyw43 loaded ok, mac 28:cd:c1:00:b6:e9
API: 12.2
Data: RaspberryPi.PicoW
Compiler: 1.29.4
ClmImport: 1.47.1
Customization: v5 22/06/24
Creation: 2022-06-24 06:55:08
dhcp_release_and_stop()
dhcp_start(netif=2000102C) w00
dhcp_start(): restarting DHCP configuration
dhcp_start(): starting DHCP configuration
Connecting to WiFi...
connect status: joining
dhcp_discover()
transaction id xid(30958aea)
dhcp_discover: making request
dhcp_discover: sendto(DISCOVER, IP_ADDR_BROADCAST, LWIP_IANA_PORT_DHCP_SERVER)
dhcp_discover: deleting()
dhcp_discover: SELECTING
dhcp_discover(): set request timeout 2000 msecs
connect status: no ip
failed to connect.

Looks like the DHCP discover is a single shot, stop and declare failure.

Here's the output with my hacky patch reapplied:

Starting FreeRTOS on core 0:
dhcp_release_and_stop()
Version: 7.95.49 (2271bb6 CY) CRC: b7a28ef3 Date: Mon 2021-11-29 22:50:27 PST Ucode Ver: 1043.2162 FWID 01-c51d9400
cyw43 loaded ok, mac 28:cd:c1:00:b6:e9
API: 12.2
Data: RaspberryPi.PicoW
Compiler: 1.29.4
ClmImport: 1.47.1
Customization: v5 22/06/24
Creation: 2022-06-24 06:55:08
dhcp_release_and_stop()
dhcp_start(netif=2000102C) w00
dhcp_start(): restarting DHCP configuration
dhcp_start(): starting DHCP configuration
Connecting to WiFi...
connect status: joining
dhcp_discover()
transaction id xid(840a7388)
dhcp_discover: making request
dhcp_discover: sendto(DISCOVER, IP_ADDR_BROADCAST, LWIP_IANA_PORT_DHCP_SERVER)
dhcp_discover: deleting()
dhcp_discover: SELECTING
dhcp_discover(): set request timeout 2000 msecs
connect status: no ip
dhcp_fine_tmr(): request timeout
dhcp_timeout()
dhcp_timeout(): restarting discovery
dhcp_discover()
transaction id xid(840a7388)
dhcp_discover: making request
dhcp_discover: sendto(DISCOVER, IP_ADDR_BROADCAST, LWIP_IANA_PORT_DHCP_SERVER)
dhcp_discover: deleting()
dhcp_discover: SELECTING
dhcp_discover(): set request timeout 4000 msecs
dhcp_recv(pbuf = 2000A068) from DHCP server 192.168.1.254 port 67
pbuf->len = 300
pbuf->tot_len = 300
skipping option 28 in options
searching DHCP_OPTION_MESSAGE_TYPE
DHCP_OFFER received in DHCP_STATE_SELECTING state
dhcp_handle_offer(netif=2000102C) w00
dhcp_handle_offer(): server 0xfe01a8c0
dhcp_handle_offer(): offer for 0xe201a8c0
dhcp_select(netif=2000102C) w00
transaction id xid(840a7388)
dhcp_select: REQUESTING
dhcp_select(): set request timeout 2000 msecs
dhcp_recv(pbuf = 2000A068) from DHCP server 192.168.1.254 port 67
pbuf->len = 300
pbuf->tot_len = 300
skipping option 28 in options
searching DHCP_OPTION_MESSAGE_TYPE
DHCP_ACK received
dhcp_bind(netif=2000102C) w00
dhcp_bind(): t0 renewal timer 86400 secs
dhcp_bind(): set request timeout 86400000 msecs
dhcp_bind(): t1 renewal timer 43200 secs
dhcp_bind(): set request timeout 43200000 msecs
dhcp_bind(): t2 rebind timer 75600 secs
dhcp_bind(): set request timeout 75600000 msecs
dhcp_bind(): IP: 0xe201a8c0 SN: 0x00ffffff GW: 0xfe01a8c0
connect status: link up
Connected.

Ready, running iperf server at 192.168.1.226
blink_task starts

Second DHCP request got a response and the show goes on.

My hack came about from comparing the cyw43_arch_threadsafe_background and cyw43_arch_freertos versions of the file, although having found the definition of sys_check_timeouts, it hints like you said that it should be called from sys_timeouts_mbox_fetch (tcpip_timeouts_mbox_fetch), which in turn is called from tcpip_thread.

I went down the rabbit hole when I realised the timer "cyw43_sleep_timer" is created but never started, calling the function periodic_worker which contains the sys_check_timeouts call in the non-freertos implementation.

Removing the patch and poking around it looks like the tcpip thread gets blocked waiting on the tcpip mutex inside tcpip_timeouts_mbox_fetch, with the main task currently holding it. It stays blocked until after the wifi connection times out and so doesn't run any of the DHCP timers etc.

Adding a few print statements:

.Starting FreeRTOS on core 0:
TCP IP THREAD MAIN LOOP <- Entered the tcpip stack task loop
tcpip_timeouts_mbox_fetch <- called tcpip_timeouts_mbox_fetch
tcpip_timeouts_mbox_fetch sleeptime = 497 <- sleep for 497 ms
dhcp_release_and_stop()
call sys_arch_mbox_fetch <- sys_arch_mbox_fetch called with sleep time
Version: 7.95.49 (2271bb6 CY) CRC: b7a28ef3 Date: Mon 2021-11-29 22:50:27 PST Ucode Ver: 1043.2162 FWID 01-c51d9400
sys_arch_mbox_fetch res = 4294967295 <- returned following timeout
locking tcpip core <- about to call lock before getting stuck with another task holding it, doesn't appear to recover
cyw43 loaded ok, mac 28:cd:c1:00:b6:e9
API: 12.2
Data: RaspberryPi.PicoW
Compiler: 1.29.4
ClmImport: 1.47.1
Customization: v5 22/06/24
Creation: 2022-06-24 06:55:08
dhcp_release_and_stop()
dhcp_start(netif=2000102C) w00
dhcp_start(): restarting DHCP configuration
dhcp_start(): starting DHCP configuration
Connecting to WiFi...
connect status: joining
dhcp_discover()
transaction id xid(cbb7a676)
dhcp_discover: making request
dhcp_discover: sendto(DISCOVER, IP_ADDR_BROADCAST, LWIP_IANA_PORT_DHCP_SERVER)
dhcp_discover: deleting()
dhcp_discover: SELECTING
dhcp_discover(): set request timeout 2000 msecs
connect status: no ip

The same debug with my hack in:

.Starting FreeRTOS on core 0:
TCP IP THREAD MAIN LOOP
tcpip_timeouts_mbox_fetch
tcpip_timeouts_mbox_fetch sleeptime = 497
dhcp_release_and_stop()
call sys_arch_mbox_fetch
Version: 7.95.49 (2271bb6 CY) CRC: b7a28ef3 Date: Mon 2021-11-29 22:50:27 PST Ucode Ver: 1043.2162 FWID 01-c51d9400
sys_arch_mbox_fetch res = 4294967295
locking tcpip core
cyw43 loaded ok, mac 28:cd:c1:00:b6:e9
API: 12.2
Data: RaspberryPi.PicoW
Compiler: 1.29.4
ClmImport: 1.47.1
Customization: v5 22/06/24
Creation: 2022-06-24 06:55:08
dhcp_release_and_stop()
dhcp_start(netif=2000102C) w00
dhcp_start(): restarting DHCP configuration
dhcp_start(): starting DHCP configuration
Connecting to WiFi...
connect status: joining
dhcp_discover()
transaction id xid(f5a9dc39)
dhcp_discover: making request
dhcp_discover: sendto(DISCOVER, IP_ADDR_BROADCAST, LWIP_IANA_PORT_DHCP_SERVER)
dhcp_discover: deleting()
dhcp_discover: SELECTING
dhcp_discover(): set request timeout 2000 msecs
connect status: no ip
dhcp_fine_tmr(): request timeout
dhcp_timeout()
dhcp_timeout(): restarting discovery
dhcp_discover()
transaction id xid(f5a9dc39)
dhcp_discover: making request
dhcp_discover: sendto(DISCOVER, IP_ADDR_BROADCAST, LWIP_IANA_PORT_DHCP_SERVER)
dhcp_discover: deleting()
dhcp_discover: SELECTING
dhcp_discover(): set request timeout 4000 msecs
dhcp_recv(pbuf = 2000A068) from DHCP server 192.168.1.254 port 67
pbuf->len = 300
pbuf->tot_len = 300
skipping option 28 in options
searching DHCP_OPTION_MESSAGE_TYPE
DHCP_OFFER received in DHCP_STATE_SELECTING state
dhcp_handle_offer(netif=2000102C) w00
dhcp_handle_offer(): server 0xfe01a8c0
dhcp_handle_offer(): offer for 0xe201a8c0
dhcp_select(netif=2000102C) w00
transaction id xid(f5a9dc39)
dhcp_select: REQUESTING
dhcp_select(): set request timeout 2000 msecs
dhcp_recv(pbuf = 2000A068) from DHCP server 192.168.1.254 port 67
pbuf->len = 300
pbuf->tot_len = 300
skipping option 28 in options
searching DHCP_OPTION_MESSAGE_TYPE
DHCP_ACK received
dhcp_bind(netif=2000102C) w00
dhcp_bind(): t0 renewal timer 86400 secs
dhcp_bind(): set request timeout 86400000 msecs
dhcp_bind(): t1 renewal timer 43200 secs
dhcp_bind(): set request timeout 43200000 msecs
dhcp_bind(): t2 rebind timer 75600 secs
dhcp_bind(): set request timeout 75600000 msecs
dhcp_bind(): IP: 0xe201a8c0 SN: 0x00ffffff GW: 0xfe01a8c0
connect status: link up
Connected.

Ready, running iperf server at 192.168.1.226
locked tcpip core <- lock acquired by tcpip thread
tcpip_timeouts_mbox_fetch sleeptime = 267

Seems that the lock holding up the timer processing is only acquired after the connection has been completed....doh.

pgreenland commented 2 years ago

@peterharperuk Gave that power management option a try, adding it immediately after the cyw43_arch_enable_sta_mode(); call in the freertos iperf demo. No change in getting it connected but it has got the ping times down from the usual 200 to 300ms with occasional timeout down to a steady 2ms. I approve this change :-).

It seems the pico goes through three stages of working with this AP. After an AP reboot it's fine for some number of connections. Then for a period it completes DHCP receiving an IP, before stopping responding to packets. It'll do this several times, until finally it fails to complete DHCP at all until the AP is rebooted again. Each time I can can see requests arriving from the pico on the AP's wireless interface, and the replies from the DHCP server supposedly going out on the wifi interface.

pgreenland commented 2 years ago

Made some progress.

@kilograham Reverted my changes and got closer to why I'm not seeing the DHCP timer and retries etc.

Seems in the iperf example (and my code derived from it), the main task which calls the wifi connection route is a higher priority (2) than the tcpip thread (1) which executes the timers.

Inside cyw43_arch_wifi_connect_until best_effort_wfe_or_timeout is called, waiting on the next event, presumably from cyw43_task which executes the sev instruction.

Either increasing the tcpip task priority above that of the main task, or using cyw43_arch_wifi_connect_async and reproducing the link monitoring code with a vTaskDelay inserted allows the tcpip task some time to run and perform the retries.

Without knowing exactly how the wfe / sve interact, I'd say the tcpip task was being starved of processor time and my previous printf based mutex analysis was garbage.

AP wise, still having issues - rather than restarting the AP I've got away with restarting hostapd on the AP before restarting the Pico, which allows it to connect every time.

Possibly a bug in hostapd? - Logs all look clean, pico appears to handshake fine even if it doesn't receive any packets on occasions where it fails.

kilograham commented 2 years ago

ah yes; WFE should not be being used here on FreeRTOS :-( ... I guess I was mostly testing with SMP which likely avoids the starvation problem... probably worth adding a separate issue for the WFE thing

kilograham commented 2 years ago

You can (and may have) just stick a short vTaskDelay in there as a workaround for now

pgreenland commented 2 years ago

I looked at sticking it in the loop inside cyw43_arch_wifi_connect_until but it's platform independent, so instead pulled out the bits I needed into main and made use of cyw43_arch_wifi_connect_async with a delay in the loop. No SDK changes to track that way too :-P.

In main:

    if (cyw43_arch_wifi_connect_async(WIFI_SSID, WIFI_PASSWORD, CYW43_AUTH_WPA2_AES_PSK))
    {
        printf("failed to request connect.\n");
        exit(1);
    }

    printf("Waiting for connection....\n");
    int status = CYW43_LINK_UP + 1;
    while(status >= 0 && status != CYW43_LINK_UP) {
        int new_status = cyw43_tcpip_link_status(&cyw43_state, CYW43_ITF_STA);
        if (new_status != status) {
            status = new_status;
            printf("connect status: %s\n", status_name(status));
        }
        vTaskDelay(pdMS_TO_TICKS(100));
    }
    printf("Connected!.\n");

Shamelessly coping cyw43_tcpip_link_status along side it.

It's not pretty....but it'll do for now.

If you've got an idea of how it should be fixed properly for FreeRTOS use, I'm happy to prepare and submit a PR.

Have raised #917 to cover this issue.

kilograham commented 1 year ago

closing this as #917 was fixed in 1.5.0