Open trombik opened 2 years ago
upstream issue: https://github.com/espressif/esp-idf/issues/9643
this issue can be resolved by rewriting wireguardif
.
Can also confirm a similar crash exists for esp8266-rtos-sdk:
Guru Meditation Error: Core 0 panic'ed (LoadProhibited). Exception was unhandled.
Core 0 register dump:
PC : 0x40263503 PS : 0x00000030 A0 : 0x4023621b A1 : 0x3ffeb9d0
0x40263503: peer_lookup_by_allowed_ip at /project/examples/wol/components/esp_wireguard/src/wireguardif.c:74
0x4023621b: wireguardif_output at /project/examples/wol/components/esp_wireguard/src/wireguardif.c:205
A2 : 0x00000000 A3 : 0x3ffeb9d0 A4 : 0x00000000 A5 : 0x00000000
A6 : 0x00000000 A7 : 0x00000000 A8 : 0x000000b0 A9 : 0x000000b0
A10 : 0x0a00080a A11 : 0x0000002f A12 : 0x3ffea660 A13 : 0x3fff33ec
A14 : 0x00009058 A15 : 0x3fff4e6c SAR : 0x0000001d EXCCAUSE: 0x0000001c
Backtrace: 0x40263503:0x3ffeb9d0 0x4023621b:0x3ffeb9d0 0x4024f120:0x3ffeb9f0 0x4024f180:0x3ffeba40 0x4024f1a8:0x3ffeba60 0x4024a61c:0x3ffeba80 0x4024b039:0x3ffebac0 0x40244809:0x3ffebb00 0x40244838:0x3ffebb40 0x40244928:0x3ffebb50 0x4024495f:0x3ffebb60 0x4024920e:0x3ffebb70 0x402495c4:0x3ffebb80 0x40235aa0:0x3ffebb90 0x4022231a:0x3ffebba0 0x40222368:0x3ffebbb0 0x402121eb:0x3ffebc00
0x40263503: peer_lookup_by_allowed_ip at /project/examples/wol/components/esp_wireguard/src/wireguardif.c:74
0x4023621b: wireguardif_output at /project/examples/wol/components/esp_wireguard/src/wireguardif.c:205
0x4024f120: ip4_output_if_opt_src at /opt/sdk/components/lwip/lwip/src/core/ipv4/ip4.c:1089
0x4024f180: ip4_output_if_opt at /opt/sdk/components/lwip/lwip/src/core/ipv4/ip4.c:895
0x4024f1a8: ip4_output_if at /opt/sdk/components/lwip/lwip/src/core/ipv4/ip4.c:868
0x4024a61c: tcp_output_control_segment at /opt/sdk/components/lwip/lwip/src/core/tcp_out.c:1951
0x4024b039: tcp_rst at /opt/sdk/components/lwip/lwip/src/core/tcp_out.c:2011
0x40244809: tcp_abandon at /opt/sdk/components/lwip/lwip/src/core/tcp.c:621
0x40244838: tcp_abort at /opt/sdk/components/lwip/lwip/src/core/tcp.c:641
0x40244928: tcp_netif_ip_addr_changed_pcblist at /opt/sdk/components/lwip/lwip/src/core/tcp.c:2411
0x4024495f: tcp_netif_ip_addr_changed at /opt/sdk/components/lwip/lwip/src/core/tcp.c:2430
0x4024920e: netif_do_ip_addr_changed at /opt/sdk/components/lwip/lwip/src/core/netif.c:458
0x402495c4: netif_remove at /opt/sdk/components/lwip/lwip/src/core/netif.c:779
0x40235aa0: esp_wireguard_disconnect at /project/examples/wol/components/esp_wireguard/src/esp_wireguard.c:315 (discriminator 15)
Can be reproduced by launching the wireguard tunnel as per examples/demo but also starting an HTTP server and then closing it and afterwards trying to close wireguard. It causes the system to fail at the above.
would you tell me which version or git tag of the SDK you are using?
would you tell me which version or git tag of the SDK you are using?
Not perfectly sure, I'm thinking it's v3.4. I'm pulling the latest revision of this docker image: https://hub.docker.com/r/mbenabda/esp8266-rtos-sdk (Because I couldn't get the SDK to work on NixOS itself)
would you tell me which version or git tag of the SDK you are using?
Also further discovery: If ANY accept() request is ran in-between of wireguard context being loaded and closed, a similar crash would occur. I hope that narrows the bug quite a bit.
the issue you are seeing is not the one in this issue. would you open new issue?
sure
Is there any progression on the esp-idf v5.0 implementation ?
I found out that the problem is from the parameters ip_info and ip_info_old, which has the value of 0 during the call of the function esp_netif_internal_dhcpc_cb(). However I don't know how to set those parameters correctly in the netif interface structure.
Is there any progression on the esp-idf v5.0 implementation ?
none. you are on your own to fix it, or support us.
I went a bit deeper to try to debug this issue, and found the following:
The crash happens because this callback "esp_netif_internal_dhcpc_cb" gets called after the call to netif_add which is part of the initialization of esp_wireguard. The code for that callback resides inside of esp-idf-v5.0/components/esp_netif/lwip/esp_netif_lwip.c
Here is the beginning:
static void esp_netif_internal_dhcpc_cb(struct netif *netif)
{
esp_netif_t *esp_netif;
ESP_LOGD(TAG, "%s lwip-netif:%p", __func__, netif);
if (netif == NULL || (esp_netif = lwip_get_esp_netif(netif)) == NULL) {
// internal pointer hasn't been configured yet (probably in the interface init_fn())
return;
}
esp_netif_ip_info_t *ip_info = esp_netif->ip_info;
esp_netif_ip_info_t *ip_info_old = esp_netif->ip_info_old;
The trick is that lwip_get_esp_netif(netif) should notice that this lwip netif struct does not have a corresponding esp_netif structure. But that doesn't happen because the check works like this: (From esp_netif_lwip.c)
static inline esp_netif_t* lwip_get_esp_netif(struct netif *netif)
{
#if LWIP_ESP_NETIF_DATA
return (esp_netif_t*)netif_get_client_data(netif, lwip_netif_client_id);
#else
return (esp_netif_t*)netif->state;
#endif
}
So here we see that in the normal case (LWIP_ESP_NETIF_DATA is 0) this function simply returns the content of netif->state. esp_wireguard stores its own pointer there to hold some configuration, so it is not empty. Because of that, in esp_netif_internal_dhcpc_cb it is assumed that it got a pointer to a esp_netif structure, and it tries to access the member data ip_info and ip_info_old, which is not there, and the program panics.
There is a simple fix to get esp_wireguard working on ESP-IDF v5.0.2: The easiest is to see what sets LWIP_ESP_NETIF_DATA, which stops using netif->state to hold the pointer to the esp_netif associated structure to the lwip netif one, and instead uses netif_get_client_data and netif_set_client_data, which stores the info somewhere else.
That variable is set here components/lwip/port/esp32/include/lwipopts.h
#if defined(CONFIG_ESP_NETIF_BRIDGE_EN) || defined(CONFIG_LWIP_PPP_SUPPORT)
/*
* If special lwip interfaces (like bridge, ppp) enabled
* `netif->state` is used internally and we must store esp-netif ptr
* in `netif->client_data`
*/
#define LWIP_ESP_NETIF_DATA (1)
#else
#define LWIP_ESP_NETIF_DATA (0)
#endif
So if we activate CONFIG_ESP_NETIF_BRIDGE_EN or CONFIG_LWIP_PPP_SUPPORT in idf.py menuconfig, it will fix the problem.
For a test, I tried activating "Enable PPP support (new/experimental)" (Name: LWIP_PPP_SUPPORT), and it works.
So in short, activate CONFIG_LWIP_PPP_SUPPORT in your project configuration in ESP-IDF-v5, recompile your project, and the wireguard tunnel will work as intended.