trombik / esp_wireguard

WireGuard Implementation for ESP-IDF.
Other
198 stars 36 forks source link

crash wtih esp-idf 5.x, or master #33

Open trombik opened 2 years ago

trombik commented 2 years ago
I (5659) demo: Initializing WireGuard.
I (5659) demo: Connecting to the peer.
I (5659) esp_wireguard: allowed_ip: 192.168.4.58
Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.

Core  0 register dump:
PC      : 0x400f8bd5  PS      : 0x00060630  A0      : 0x800f8ced  A1      : 0x3ffbaaf0  
0x400f8bd5: esp_netif_internal_dhcpc_cb at /usr/home/trombik/github/trombik/esp-idf/components/esp_netif/lwip/esp_netif_lwip.c:1111

A2      : 0x3ffb467c  A3      : 0x3ffbac88  A4      : 0x00000000  A5      : 0x3ffbacf0  
A6      : 0x0000000c  A7      : 0xff000000  A8      : 0x3a04a8c0  A9      : 0x00000000  
A10     : 0x00000004  A11     : 0x3f40d29c  A12     : 0x3f40d4a0  A13     : 0x00001625  
A14     : 0x3f40d29c  A15     : 0x3f40ded8  SAR     : 0x00000004  EXCCAUSE: 0x0000001c  
EXCVADDR: 0x00000000  LBEG    : 0x400014fd  LEND    : 0x4000150d  LCOUNT  : 0xfffffffc  

Backtrace: 0x400f8bd2:0x3ffbaaf0 0x400f8cea:0x3ffbab40 0x400e5ca3:0x3ffbab60 0x400e5d4e:0x3ffbab80 0x400e5e92:0x3ffbac00 0x400d7b34:0x3ffbac30 0x400d7dd4:0x3ffbacc0 0x400d76c6:0x3ffbacf0 0x400d78d8:0x3ffbad10 0x40154b3b:0x3ffbadb0 0x4008bca5:0x3ffbade0
0x400f8bd2: esp_netif_internal_dhcpc_cb at /usr/home/trombik/github/trombik/esp-idf/components/esp_netif/lwip/esp_netif_lwip.c:1108

0x400f8cea: netif_callback_fn at /usr/home/trombik/github/trombik/esp-idf/components/esp_netif/lwip/esp_netif_lwip.c:123

0x400e5ca3: netif_invoke_ext_callback at /usr/home/trombik/github/trombik/esp-idf/components/lwip/lwip/src/core/netif.c:1819

0x400e5d4e: netif_set_addr at /usr/home/trombik/github/trombik/esp-idf/components/lwip/lwip/src/core/netif.c:733

0x400e5e92: netif_add at /usr/home/trombik/github/trombik/esp-idf/components/lwip/lwip/src/core/netif.c:376

0x400d7b34: esp_wireguard_netif_create at /usr/home/trombik/github/trombik/esp_wireguard/examples/demo/components/esp_wireguard/src/esp_wireguard.c:180

0x400d7dd4: esp_wireguard_connect at /usr/home/trombik/github/trombik/esp_wireguard/examples/demo/components/esp_wireguard/src/esp_wireguard.c:235

0x400d76c6: wireguard_setup at /usr/home/trombik/github/trombik/esp_wireguard/examples/demo/main/main.c:79 (discriminator 13)

0x400d78d8: app_main at /usr/home/trombik/github/trombik/esp_wireguard/examples/demo/main/main.c:381

0x40154b3b: main_task at /usr/home/trombik/github/trombik/esp-idf/components/freertos/FreeRTOS-Kernel/portable/port_common.c:131 (discriminator 2)

0x4008bca5: vPortTaskWrapper at /usr/home/trombik/github/trombik/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:151

ELF file SHA256: f2a1169a2b5c9528

Rebooting...
trombik commented 2 years ago

upstream issue: https://github.com/espressif/esp-idf/issues/9643

trombik commented 2 years ago

this issue can be resolved by rewriting wireguardif.

nikp123 commented 1 year ago

Can also confirm a similar crash exists for esp8266-rtos-sdk:

Guru Meditation Error: Core  0 panic'ed (LoadProhibited). Exception was unhandled.
Core 0 register dump:
PC      : 0x40263503  PS      : 0x00000030  A0      : 0x4023621b  A1      : 0x3ffeb9d0  
0x40263503: peer_lookup_by_allowed_ip at /project/examples/wol/components/esp_wireguard/src/wireguardif.c:74

0x4023621b: wireguardif_output at /project/examples/wol/components/esp_wireguard/src/wireguardif.c:205

A2      : 0x00000000  A3      : 0x3ffeb9d0  A4      : 0x00000000  A5      : 0x00000000  
A6      : 0x00000000  A7      : 0x00000000  A8      : 0x000000b0  A9      : 0x000000b0  
A10     : 0x0a00080a  A11     : 0x0000002f  A12     : 0x3ffea660  A13     : 0x3fff33ec  
A14     : 0x00009058  A15     : 0x3fff4e6c  SAR     : 0x0000001d  EXCCAUSE: 0x0000001c  

Backtrace: 0x40263503:0x3ffeb9d0 0x4023621b:0x3ffeb9d0 0x4024f120:0x3ffeb9f0 0x4024f180:0x3ffeba40 0x4024f1a8:0x3ffeba60 0x4024a61c:0x3ffeba80 0x4024b039:0x3ffebac0 0x40244809:0x3ffebb00 0x40244838:0x3ffebb40 0x40244928:0x3ffebb50 0x4024495f:0x3ffebb60 0x4024920e:0x3ffebb70 0x402495c4:0x3ffebb80 0x40235aa0:0x3ffebb90 0x4022231a:0x3ffebba0 0x40222368:0x3ffebbb0 0x402121eb:0x3ffebc00 
0x40263503: peer_lookup_by_allowed_ip at /project/examples/wol/components/esp_wireguard/src/wireguardif.c:74

0x4023621b: wireguardif_output at /project/examples/wol/components/esp_wireguard/src/wireguardif.c:205

0x4024f120: ip4_output_if_opt_src at /opt/sdk/components/lwip/lwip/src/core/ipv4/ip4.c:1089

0x4024f180: ip4_output_if_opt at /opt/sdk/components/lwip/lwip/src/core/ipv4/ip4.c:895

0x4024f1a8: ip4_output_if at /opt/sdk/components/lwip/lwip/src/core/ipv4/ip4.c:868

0x4024a61c: tcp_output_control_segment at /opt/sdk/components/lwip/lwip/src/core/tcp_out.c:1951

0x4024b039: tcp_rst at /opt/sdk/components/lwip/lwip/src/core/tcp_out.c:2011

0x40244809: tcp_abandon at /opt/sdk/components/lwip/lwip/src/core/tcp.c:621

0x40244838: tcp_abort at /opt/sdk/components/lwip/lwip/src/core/tcp.c:641

0x40244928: tcp_netif_ip_addr_changed_pcblist at /opt/sdk/components/lwip/lwip/src/core/tcp.c:2411

0x4024495f: tcp_netif_ip_addr_changed at /opt/sdk/components/lwip/lwip/src/core/tcp.c:2430

0x4024920e: netif_do_ip_addr_changed at /opt/sdk/components/lwip/lwip/src/core/netif.c:458

0x402495c4: netif_remove at /opt/sdk/components/lwip/lwip/src/core/netif.c:779

0x40235aa0: esp_wireguard_disconnect at /project/examples/wol/components/esp_wireguard/src/esp_wireguard.c:315 (discriminator 15)

Can be reproduced by launching the wireguard tunnel as per examples/demo but also starting an HTTP server and then closing it and afterwards trying to close wireguard. It causes the system to fail at the above.

trombik commented 1 year ago

would you tell me which version or git tag of the SDK you are using?

nikp123 commented 1 year ago

would you tell me which version or git tag of the SDK you are using?

Not perfectly sure, I'm thinking it's v3.4. I'm pulling the latest revision of this docker image: https://hub.docker.com/r/mbenabda/esp8266-rtos-sdk (Because I couldn't get the SDK to work on NixOS itself)

nikp123 commented 1 year ago

would you tell me which version or git tag of the SDK you are using?

Also further discovery: If ANY accept() request is ran in-between of wireguard context being loaded and closed, a similar crash would occur. I hope that narrows the bug quite a bit.

trombik commented 1 year ago

the issue you are seeing is not the one in this issue. would you open new issue?

nikp123 commented 1 year ago

sure

Odysseusfr commented 1 year ago

Is there any progression on the esp-idf v5.0 implementation ?

Odysseusfr commented 1 year ago

I found out that the problem is from the parameters ip_info and ip_info_old, which has the value of 0 during the call of the function esp_netif_internal_dhcpc_cb(). However I don't know how to set those parameters correctly in the netif interface structure.

trombik commented 1 year ago

Is there any progression on the esp-idf v5.0 implementation ?

none. you are on your own to fix it, or support us.

amaldo commented 1 year ago

I went a bit deeper to try to debug this issue, and found the following:

The crash happens because this callback "esp_netif_internal_dhcpc_cb" gets called after the call to netif_add which is part of the initialization of esp_wireguard. The code for that callback resides inside of esp-idf-v5.0/components/esp_netif/lwip/esp_netif_lwip.c

Here is the beginning:

static void esp_netif_internal_dhcpc_cb(struct netif *netif)
{           
    esp_netif_t *esp_netif;
    ESP_LOGD(TAG, "%s lwip-netif:%p", __func__, netif);
    if (netif == NULL || (esp_netif = lwip_get_esp_netif(netif)) == NULL) {
        // internal pointer hasn't been configured yet (probably in the interface init_fn())
        return;
    }

    esp_netif_ip_info_t *ip_info = esp_netif->ip_info;
    esp_netif_ip_info_t *ip_info_old = esp_netif->ip_info_old;

The trick is that lwip_get_esp_netif(netif) should notice that this lwip netif struct does not have a corresponding esp_netif structure. But that doesn't happen because the check works like this: (From esp_netif_lwip.c)

static inline esp_netif_t* lwip_get_esp_netif(struct netif *netif)
{
#if LWIP_ESP_NETIF_DATA
    return (esp_netif_t*)netif_get_client_data(netif, lwip_netif_client_id);
#else
    return (esp_netif_t*)netif->state;
#endif
}

So here we see that in the normal case (LWIP_ESP_NETIF_DATA is 0) this function simply returns the content of netif->state. esp_wireguard stores its own pointer there to hold some configuration, so it is not empty. Because of that, in esp_netif_internal_dhcpc_cb it is assumed that it got a pointer to a esp_netif structure, and it tries to access the member data ip_info and ip_info_old, which is not there, and the program panics.

There is a simple fix to get esp_wireguard working on ESP-IDF v5.0.2: The easiest is to see what sets LWIP_ESP_NETIF_DATA, which stops using netif->state to hold the pointer to the esp_netif associated structure to the lwip netif one, and instead uses netif_get_client_data and netif_set_client_data, which stores the info somewhere else.

That variable is set here components/lwip/port/esp32/include/lwipopts.h

#if defined(CONFIG_ESP_NETIF_BRIDGE_EN) || defined(CONFIG_LWIP_PPP_SUPPORT)
/*
 * If special lwip interfaces (like bridge, ppp) enabled
 * `netif->state` is used internally and we must store esp-netif ptr
 * in `netif->client_data`
 */
#define LWIP_ESP_NETIF_DATA             (1)
#else
#define LWIP_ESP_NETIF_DATA             (0)
#endif

So if we activate CONFIG_ESP_NETIF_BRIDGE_EN or CONFIG_LWIP_PPP_SUPPORT in idf.py menuconfig, it will fix the problem.

For a test, I tried activating "Enable PPP support (new/experimental)" (Name: LWIP_PPP_SUPPORT), and it works.

So in short, activate CONFIG_LWIP_PPP_SUPPORT in your project configuration in ESP-IDF-v5, recompile your project, and the wireguard tunnel will work as intended.