opnsense / core

OPNsense GUI, API and systems backend
https://opnsense.org/
BSD 2-Clause "Simplified" License
3.28k stars 731 forks source link

wireguard: not fully configured after reboot #7148

Closed hakkabara closed 6 months ago

hakkabara commented 8 months ago

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug Hello I have a bug. I have a wireguard tunnel for friends and family to reach my services. I didnt change any rules for weeks everything was fine I just updated and rebooted and then my rules didnt apply anymore. I just need to add a new one, delete the rule and apply so anything is as before and everything is working...

on the top the green ones are the rules after adding a dummy rule and delete it... See Screenshots: Capture 1 image image

wireguard connection is also stable and connected image

my aliases debiandocker contains: 192.168.189.3 httphttps: 80, 443 DNS: 53, 853 see screenshot

image image

I use a VM running on proxmox

boomer41 commented 7 months ago

I have the same behavior with Wireguard interfaces. The problem started with some 23.x release, and are still present in 24.1. :(

Just saving a rule without changing anything and reloading the firewall restores functionality.

Unfortunately I have not found and log entries suggesting some kind of error.

fichtner commented 7 months ago

This has been known to happen for unknown reasons unfortnately.

Can you post /tmp/rules.debug and /tmp/ifconfig.debug from after boot (broken state) and after manual rules apply (good state).

Cheers, Franco

boomer41 commented 7 months ago

Can you post /tmp/rules.debug and /tmp/rules.ifconfig from after boot (broken state) and after manual rules apply (good state).

I have captured /tmp/rules.debug and /tmp/rules.limits. /tmp/rules.ifconfig does not exist on my 24.1. Additionally, i have captured a pfctl -vvs all in both scenarios.

Can I send you the logs via e-mail or such? Appending sensitive information on a public issue doesn't seem like the best idea :)

fichtner commented 7 months ago

Sorry, the file is /tmp/ifconfig.debug.

You can send it to franco@opnsense.org

Thanks!

boomer41 commented 7 months ago

Sorry, the file is /tmp/ifconfig.debug.

Yep, that one works :)

You can send it to franco@opnsense.org

Done.

SeimusS commented 7 months ago

Hello,

I have the same issue as boomer41. If there is something I can provide as well let me know franco.

Regards, S.

Westie commented 7 months ago

I posted this on my thread on the forums, however if someone is wanting a quick emergency fix, have a quick look at this gist.

https://gist.github.com/Westie/5557cffd927dd32de93255e5ac4a22e0

Feel free to adjust as needed!

fichtner commented 7 months ago

It appears that the current code has issues with assigned wireguard instances and you can try this patch on 24.1 and above: 9e01b27

# opnsense-patch 9e01b27

I'm not 100% sure it solves the issues reported, but it will bring us to a more consistent outcome after reload which can reveal another underlying issue better.

Cheers, Franco

boomer41 commented 7 months ago

It appears that the current code has issues with assigned wireguard instances and you can try this patch on 24.1 and above: 9e01b27

# opnsense-patch 9e01b27

I'm not 100% sure it solves the issues reported, but it will bring us to a more consistent outcome after reload which can reveal another underlying issue better.

Cheers, Franco

I have applied your patch, but the issue still persists the same way :( The code before did always reload the interface, didn't it? Now it doesn't when the interface exists.

fichtner commented 7 months ago

@boomer41 from your output that seemed to be the obvious issue and I could reproduce. Can you send me the new data gathered the same way but with the patch applied? The rules looked good in both cases (no diff between them) so I suspect a more fundamental issue relating to VIPs or DNS resolution or requiring overlap in connectivity (boiling down to route complexity) which I can't see from these files gathered, but step by step will do it.

So just to be clear after reboot it doesn't work and then you do "x" and it works? What is "x" in precise terms again so I can focus on this.

Thanks, Franco

boomer41 commented 7 months ago

Logs sent via mail with the patch applied.

As stated in the email, I just log on to the management UI, edit & save some arbitrary rule to get the "Apply changes"-Button. After hitting that apply button, things start working.

fichtner commented 7 months ago

Can you confirm that running

# configctl filter reload

Does the same thing?

boomer41 commented 7 months ago

Can you confirm that running

# configctl filter reload

Does the same thing?

Can confirm. Running that command fixes the issue the same way as the UI way does.

fichtner commented 7 months ago

Here is a backport of our recent efforts which adds cleanly to 24.1.2 https://github.com/opnsense/core/commit/7d35204f2a, to apply:

# opnsense-patch 7d35204f2a

Cheers, Franco

SeimusS commented 7 months ago

I've tried the patch on 24.1.2 sadly didn't fix the problem for me. After the reboot issue still present.

sudo opnsense-patch 7d35204f2a
Password:
Fetched 7d35204f2a via https://github.com/opnsense/core
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|From 7d35204f2aab13311aee3a266f07972370922ccd Mon Sep 17 00:00:00 2001
|From: Franco Fichtner <franco@opnsense.org>
|Date: Thu, 8 Feb 2024 17:13:32 +0100
|Subject: [PATCH] wireguard: address assorted interface configuration
| inconsistencies #7148
|
|(cherry picked from commit b8665c9da0780a7744da5e84e6cafa4183f37f57)
|(cherry picked from commit 7413ca696dbb5e8c1f4786207054e43c05b9f8c4)
|(cherry picked from commit 30862f87113865898630127c2f1f790d34678be1)
|(cherry picked from commit dbe52eeaa9c17ec56a22ff6cefcf6b94615bd8b4)
|(cherry picked from commit e0cee10ad13e33e603a50c33c62a31b5dd8def6e)
|---
| .../scripts/Wireguard/wg-service-control.php  | 48 ++++++++++++++-----
| 1 file changed, 36 insertions(+), 12 deletions(-)
|
|diff --git a/src/opnsense/scripts/Wireguard/wg-service-control.php b/src/opnsense/scripts/Wireguard/wg-service-control.php
|index 09dcd57b19..7f801b9b44 100755
|--- a/src/opnsense/scripts/Wireguard/wg-service-control.php
|+++ b/src/opnsense/scripts/Wireguard/wg-service-control.php
--------------------------
Patching file opnsense/scripts/Wireguard/wg-service-control.php using Plan A...
Hunk #1 succeeded at 62.
Hunk #2 succeeded at 80.
Hunk #3 succeeded at 134.
Hunk #4 succeeded at 273.
Hunk #5 succeeded at 287.
Hunk #6 succeeded at 332.
done
All patches have been applied successfully.  Have a nice day.

*** FINAL System shutdown message from ***** ***

System going down IMMEDIATELY

Regards, S.

boomer41 commented 7 months ago
# opnsense-patch 7d35204f2a

Reversed all debug patches and applied this one. In contrast to @SeimusS, the patch works for me. Running 24.1_1.

fichtner commented 7 months ago

@SeimusS what was "the problem" again? There have been overlapping issues but some are beyond our control, especially with DNS-based endpoints and dynamic connections.

@boomer41 ok, that's good. We'll be adding this to 24.1.3 to bring us closer to being able to identify other issues then.

Cheers, Franco

SeimusS commented 7 months ago

@boomer41 Happy to hear it worked for you.

@fichtner Basically the problem for me is that after a reboot of OPN, the traffic coming out of WG to External destinations, public, is not working. I can still reach the LAN without problem but anything that goes to internet will fail if the source is a WG host. When looking closer for me it looks like NAT is not being applied correctly for WG even thou I use automatic rules and can see the WG interface as part of the auto created rule.

When hitting the apply button in FW > NAT > Outbound or using "configctl filter reload" it starts to work normally again.

edit: I use RA setup, DNS is not OPNsense but on RPi.

Regards, S.

fichtner commented 7 months ago

The wireguard logs with the patch applied might help pin this down further. Basically it tries to retain wireguard interfaces so that NAT et al are applied correctly all the time. Not sure where the issue now lies. Are your endpoint addresses using hostnames instead of IPs?

SeimusS commented 7 months ago

No, Clients use IPs not hostnames. Each WG client has configured a IP/32 on its end.

image

Similar as well IP on the dedicated host.

Regards, S.

fichtner commented 7 months ago

@SeimusS Ok, it would be better to start with the basics as described in https://github.com/opnsense/core/issues/7148#issuecomment-1925674334 sent to franco AT opnsense DOT org

SeimusS commented 7 months ago

Done, logs generated (& sent to you) as per the comment mentioned after reboot, pre and post reapplying the NAT rules/"configctl filter reload"

Let me known if you need anything else from my OPNsense box.

Regards, S.

fichtner commented 6 months ago

In accordance with @SeimusS we're closing this issue and wait for other user reports or code changes to make the problem more visible. The main problem reported here, however, has been fixed since 24.1.3.

Cheers, Franco

illum1n4ti commented 4 months ago

Hallo Franco

I do still have issues with Wireguard from surfshark. Every update (OPNsense 24.1.7) i am applying this opnsense-patch 7d35204f2a and it works but is this the way i have to make my wireguard wotk?

cheers

fichtner commented 4 months ago

@illum1n4ti it depends how old your initial setup is and what "have issues" means? There have been a lot of individual cases that were cleared up and re-applying the patch just removes it so you are going backwards in time...

illum1n4ti commented 4 months ago

@fichtner Thank u for your replay Before i have been using version 23.x and when i upgraded to 24.1 was still fine working, but when the WireGuard plugin was removed and was implemented in OPNsense than the problems started. I am not getting handshake with surfshark

thanks to the patch i still can use Surfshark VPN with WireGuard, but i am afraid with the patch i am using old protocol plus maybe there are bugs for security reasons

I hope u could give me some advise

fichtner commented 4 months ago

Common mistake was that assigned wireguard interface had IPv4 mode not set to "none" (IPv6 mode too) which is now prohibited and IP address needs to be set in instance as "tunnel address". Brought it back to life for most people.

hsand commented 2 months ago

I had the same issue with 24.1.10, seems to be related to NAT.

I was able to get things working by following Step 4(b) in the documentation (even though it says this step is not necessary) - https://docs.opnsense.org/manual/how-tos/wireguard-client.html#step-4-b-create-an-outbound-nat-rule

itsMaxio commented 1 month ago

The problem is still not solved. Automatic NAT rules are still not added to the Wireguard interface, they appear in the GUI but they are not in the debug files. Only after restarting the rules via GUI or console they are added. image

Does anyone have any solution? Maybe you @fichtner would have a solution? I have manual rules set but I wonder why the automatic ones don't work.

SeimusS commented 3 weeks ago

@itsMaxio This could be a bit different issue than the one fixed in that PR.

If I remember correctly I have the same problem or similar problem you reported. Basicaly WG is working but only able to access LAN related stuff not able to access Internet as the Automatic NAT rules are missing when checking the debug.

Tried to Tshoot it with Franco's help but I couldn't find why its happening. So I have currently a permanent workaround in place that reloads the filter 2s after boot.

1. Create a file in the rc.syshook.d/start/ called 92-wireguard-firewall-workaround vi /usr/local/etc/rc.syshook.d/start/92-wireguard-firewall-workaround

2. Put in

#!/bin/sh
sleep 2
configctl filter reload

This basically makes sure in case the device gets rebooted the Wireguard will work as intended.

You also may want to open a new Ticket for this WG issue as this thread was closed with a PR fix.

Regards, S.

b4zyl commented 3 days ago

@SeimusS

Even when i did add your workaround script to Opnsense i still need to manually reload Wireguard service each Opnsense restart :(