trombik / esp_wireguard

WireGuard Implementation for ESP-IDF.
Other
193 stars 34 forks source link

Crash on esp32s3 #50

Open eroom1966 opened 6 months ago

eroom1966 commented 6 months ago

Hi @trombik

many thanks for this repo. I wanted to mention that I had a crash which I was unable to reproduce in this code

static struct wireguard_peer *peer_lookup_by_allowed_ip(struct wireguard_device *device, const ip_addr_t *ipaddr) {
...
        tmp = &device->peers[x];
        if (tmp->valid) {

As far as I can see tmp was NULL, and hence there was no dereference to tmp->valid I have not been able to reproduce, but maybe this will give you an indication I was going to make a trivial fix to change to if (tmp && tmp->valid) { I will leave to you to decide

Thx Lee

CatoLynx commented 2 weeks ago

Chiming in here because I'm on the same path right now. This seems to occur when the wireguard interface is being shut down but a TCP connection is still active. The issue is not tmp being NULL, but rather device being NULL. lwip calls tcp_abort() which calls the wireguard interface's wireguardif_output function, which uses the passed netif's state member as the device. So somewhere in here, lwip or something else sets netif->state to NULL.

I'm still figuring out how to fix this. This rings a bell to another issue I had (and which was explained over here - See amaldo's comment on May 30, 2023) which also had to do with netif->state. Actually, looking at it again, this very crash is mentioned in there (#33) as well as in #38.

UPDATE: In esp_wireguard_disconnect, wireguardif_shutdown() is called before netif_remove(). wireguardif_shutdown() sets netif->state to NULL, which then causes this crash in case there is still an open connection when netif_remove() is called.

jefftharris commented 2 weeks ago

See the fix in #45. Also, we found this patch helps to gracefully disconnect the TCP sockets before Wireguard is shutdown:

diff --git a/components/esp_wireguard/src/esp_wireguard.c b/components/esp_wireguard/src/esp_wireguard.c
index 0b463be2..b37f632d 100644
--- a/components/esp_wireguard/src/esp_wireguard.c
+++ b/components/esp_wireguard/src/esp_wireguard.c
@@ -332,6 +332,10 @@ esp_err_t esp_wireguard_disconnect(wireguard_ctx_t *ctx)
         return ESP_OK;
     }

+    // Clear the IP address to gracefully disconnect any clients while the
+    // peers are still valid
+    netif_set_ipaddr(ctx->netif, IP4_ADDR_ANY4);
+
     lwip_err = wireguardif_disconnect(ctx->netif, peer_index);
     if (lwip_err != ERR_OK) {
         ESP_LOGW(TAG, "wireguardif_disconnect: peer_index: %" PRIu8 " err: %i", peer_index, lwip_err);
CatoLynx commented 2 weeks ago

Thanks for linking that issue, I forgot to check closed issues... Although I did come up with pretty much this exact fix myself, minus the netif_set_ipaddr call.