opnsense / core

OPNsense GUI, API and systems backend
https://opnsense.org/
BSD 2-Clause "Simplified" License
3.31k stars 738 forks source link

OpenVPN: HA behaves different when going in Maintenenace mode compared to if-down #6506

Closed mimugmail closed 1 year ago

mimugmail commented 1 year ago

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

I'm running an OPNsense pair in HA, also using multiwan with 2 WANs and gateway failover (no gateway-groups). There is a site-2-site VPN via OpenVPN and since I use multiple WANs I cannot set a WAN VIP IP interfaces (in openvpn client config). So what I'm doing is to selec the LAN VIP and cause of usual NAT everything works, no matter which line is default.

Also when I enter mnt-mode in Interface : Virtual IPs : Status, services are moved to FW2 and everything goes on. But when only the LAN interface on FW1 goes down and nothing else, CARP detects the event, moves IP to FW2 but it doesn't kill the OpenVPN process of FW1, causing both Firewalls kicking out each other.

To Reproduce

This happens on usual mnt mode hopping:

Enter MNT Mode FW1

<27>1 2023-04-16T08:59:28+02:00 fw-01.domain.de openvpn_client2 23867 - [meta sequenceId="1"] event_wait : Interrupted system call (code=4)
<29>1 2023-04-16T08:59:28+02:00 fw-01.domain.de openvpn_client2 23867 - [meta sequenceId="2"] /sbin/route delete -net 10.0.0.0 172.30.6.1 255.0.0.0
<29>1 2023-04-16T08:59:28+02:00 fw-01.domain.de openvpn_client2 23867 - [meta sequenceId="3"] Closing TUN/TAP interface
<29>1 2023-04-16T08:59:28+02:00 fw-01.domain.de openvpn_client2 23867 - [meta sequenceId="4"] /usr/local/etc/inc/plugins.inc.d/openvpn/ovpn-linkdown ovpnc2 1500 1554 172.30.6.2 255.255.255.0 init
<29>1 2023-04-16T08:59:28+02:00 fw-01.domain.de openvpn_client2 23867 - [meta sequenceId="5"] SIGTERM[hard,] received, process exiting

Exiting MNT Mode, Log FW2:

<27>1 2023-04-16T09:00:22+02:00 fw-02.domain.de openvpn_client2 79424 - [meta sequenceId="44"] event_wait : Interrupted system call (code=4)
<29>1 2023-04-16T09:00:22+02:00 fw-02.domain.de openvpn_client2 79424 - [meta sequenceId="45"] /sbin/route delete -net 10.0.0.0 172.30.6.1 255.0.0.0
<29>1 2023-04-16T09:00:22+02:00 fw-02.domain.de openvpn_client2 79424 - [meta sequenceId="46"] Closing TUN/TAP interface
<29>1 2023-04-16T09:00:22+02:00 fw-02.domain.de openvpn_client2 79424 - [meta sequenceId="47"] /usr/local/etc/inc/plugins.inc.d/openvpn/ovpn-linkdown ovpnc2 1500 1554 172.30.6.2 255.255.255.0 init
<29>1 2023-04-16T09:00:22+02:00 fw-02.domain.de openvpn_client2 79424 - [meta sequenceId="48"] SIGTERM[hard,] received, process exiting

When putting down only the lan interface the process ID is not killed:

system.log
<13>1 2023-04-16T08:41:56+02:00 fw-01.domain.de opnsense 73371 - [meta sequenceId="3142"] /usr/local/etc/rc.syshook.d/carp/20-openvpn: Carp cluster member " (10.9.6.1) (4@igb3_vlan200)" has resumed the state "BACKUP" for vhid 4
<13>1 2023-04-16T08:41:56+02:00 fw-01.domain.de opnsense 73371 - [meta sequenceId="3143"] /usr/local/etc/rc.syshook.d/carp/20-openvpn: Resyncing OpenVPN instances for interface  (10.9.6.1).
<13>1 2023-04-16T08:41:56+02:00 fw-01.domain.de opnsense 74869 - [meta sequenceId="3144"] /usr/local/etc/rc.syshook.d/carp/20-openvpn: Carp cluster member " (192.168.6.1) (2@igb3_vlan300)" has resumed the state "BACKUP" for vhid 2
<13>1 2023-04-16T08:41:56+02:00 fw-01.domain.de opnsense 74869 - [meta sequenceId="3145"] /usr/local/etc/rc.syshook.d/carp/20-openvpn: Resyncing OpenVPN instances for interface  (192.168.6.1).

dmesg -a
carp: 1@igb3_vlan100: MASTER -> INIT (hardware interface down)
carp: demoted by 240 to 240 (interface down)
carp: 4@igb3_vlan200: MASTER -> BACKUP (more frequent advertisement received)
carp: 2@igb3_vlan300: MASTER -> BACKUP (more frequent advertisement received)

openvpn.log
<27>1 2023-04-16T08:41:59+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="1"] Connection reset, restarting [0]
<29>1 2023-04-16T08:41:59+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="2"] SIGUSR1[soft,connection-reset] received, process restarting
<29>1 2023-04-16T08:41:59+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="3"] Restart pause, 5 second(s)
<28>1 2023-04-16T08:42:04+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="4"] WARNING: No server certificate verification method has been enabled.  See http://openvpn.net/howto
.html#mitm for more info.
<28>1 2023-04-16T08:42:04+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="5"] NOTE: the current --script-security setting may allow this configuration to call user-defined scri
pts
<29>1 2023-04-16T08:42:04+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="6"] Outgoing Control Channel Authentication: Using 256 bit message hash 'SHA256' for HMAC authenticati
on
<29>1 2023-04-16T08:42:04+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="7"] Incoming Control Channel Authentication: Using 256 bit message hash 'SHA256' for HMAC authenticati
on
<29>1 2023-04-16T08:42:04+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="8"] TCP/UDP: Preserving recently used remote address: [AF_INET]101.19.66.38:3006
<29>1 2023-04-16T08:42:04+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="9"] Socket Buffers: R=[65228->65228] S=[65228->65228]
<29>1 2023-04-16T08:42:04+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="10"] Attempting to establish TCP connection with [AF_INET]101.19.66.38:3006 [nonblock]
<29>1 2023-04-16T08:42:04+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="11"] TCP connection established with [AF_INET]101.19.66.38:3006
<29>1 2023-04-16T08:42:04+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="12"] TCP_CLIENT link local (bound): [AF_INET]10.0.6.11:0
<29>1 2023-04-16T08:42:04+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="13"] TCP_CLIENT link remote: [AF_INET]101.19.66.38:3006
<29>1 2023-04-16T08:42:04+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="14"] TLS: Initial packet from [AF_INET]101.19.66.38:3006, sid=36294c35 390a23a0
<29>1 2023-04-16T08:42:04+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="15"] VERIFY OK: depth=1, C=DE, ST=Berlin, L=Berlin, O=Company GmbH, 
<29>1 2023-04-16T08:42:04+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="16"] VERIFY OK: depth=0, C=DE, ST=Berlin, L=Berlin, O=Company GmbH, 
<29>1 2023-04-16T08:42:04+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="17"] Control Channel: TLSv1.3, cipher TLSv1.3 TLS_AES_256_GCM_SHA384, peer certificate: 2048 bit RSA,
signature: RSA-SHA256
<29>1 2023-04-16T08:42:04+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="18"] [central-opn.domain.de] Peer Connection Initiated with [AF_INET]101.19.66.38:3006
<29>1 2023-04-16T08:42:04+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="19"] PUSH: Received control message: 'PUSH_REPLY,route 10.0.0.0 255.0.0.0,route-gateway 172.30.6.1,top
ology subnet,ping 10,ping-restart 60,route 10.0.0.0 255.0.0.0,ifconfig 172.30.6.2 255.255.255.0,peer-id 0,cipher AES-256-GCM'
<29>1 2023-04-16T08:42:04+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="20"] OPTIONS IMPORT: timers and/or timeouts modified
<29>1 2023-04-16T08:42:04+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="21"] OPTIONS IMPORT: --ifconfig/up options modified
<29>1 2023-04-16T08:42:04+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="22"] OPTIONS IMPORT: route options modified
<29>1 2023-04-16T08:42:04+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="23"] OPTIONS IMPORT: route-related options modified
<29>1 2023-04-16T08:42:04+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="24"] OPTIONS IMPORT: peer-id set
<29>1 2023-04-16T08:42:04+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="25"] OPTIONS IMPORT: adjusting link_mtu to 1626
<29>1 2023-04-16T08:42:04+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="26"] OPTIONS IMPORT: data channel crypto options modified
<29>1 2023-04-16T08:42:04+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="27"] Data Channel: using negotiated cipher 'AES-256-GCM'
<29>1 2023-04-16T08:42:04+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="28"] Outgoing Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
<29>1 2023-04-16T08:42:04+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="29"] Incoming Data Channel: Cipher 'AES-256-GCM' initialized with 256 bit key
<29>1 2023-04-16T08:42:04+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="30"] Preserving previous TUN/TAP instance: ovpnc2
<29>1 2023-04-16T08:42:04+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="31"] Initialization Sequence Completed
<27>1 2023-04-16T08:42:09+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="32"] Connection reset, restarting [0]
<29>1 2023-04-16T08:42:09+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="33"] SIGUSR1[soft,connection-reset] received, process restarting
<29>1 2023-04-16T08:42:09+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="34"] Restart pause, 5 second(s)
<28>1 2023-04-16T08:42:14+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="35"] WARNING: No server certificate verification method has been enabled.  See http://openvpn.net/howto.html#mitm for more info.
<28>1 2023-04-16T08:42:14+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="36"] NOTE: the current --script-security setting may allow this configuration to call user-defined scripts
<29>1 2023-04-16T08:42:14+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="37"] Outgoing Control Channel Authentication: Using 256 bit message hash 'SHA256' for HMAC authentication
<29>1 2023-04-16T08:42:14+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="38"] Incoming Control Channel Authentication: Using 256 bit message hash 'SHA256' for HMAC authentication
<29>1 2023-04-16T08:42:14+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="39"] TCP/UDP: Preserving recently used remote address: [AF_INET]101.19.66.38:3006
<29>1 2023-04-16T08:42:14+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="40"] Socket Buffers: R=[65228->65228] S=[65228->65228]
<29>1 2023-04-16T08:42:14+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="41"] Attempting to establish TCP connection with [AF_INET]101.19.66.38:3006 [nonblock]
<29>1 2023-04-16T08:42:14+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="42"] TCP connection established with [AF_INET]101.19.66.38:3006
<29>1 2023-04-16T08:42:14+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="43"] TCP_CLIENT link local (bound): [AF_INET]10.0.6.11:0
<29>1 2023-04-16T08:42:14+02:00 fw-01.domain.de openvpn_client2 69368 - [meta sequenceId="44"] TCP_CLIENT link remote: [AF_INET]101.19.66.38:3006

Expected behavior

Same behavior

Describe alternatives you considered

No alternatives

Screenshots

We can start a troubleshooting session if needed

Relevant log files

Above

Additional context

Add any other context about the problem here.

Environment

Software version used and hardware type if relevant, e.g.:

OPNsense 23.1.5_4-amd64FreeBSD 13.1-RELEASE-p7OpenSSL 1.1.1t 7 Feb 2023

AdSchellevis commented 1 year ago

@mimugmail can you try to remove these lines and retry?

https://github.com/opnsense/core/blob/835ee6e6107912d876cc7fea5a5654333e426f5d/src/etc/rc.syshook.d/carp/20-openvpn#L38-L41

on INIT it bails, but I'm not sure that's the only issue you have here.

mimugmail commented 1 year ago

Hi @AdSchellevis thx for taking the time. I removed the lines on FW1 but when ifdown the interface it's still the same situation. I'll ask them to just unplug the cable, maybe it won't go into INIT, rather in BACKUP mode.

AdSchellevis commented 1 year ago

@mimugmail can you try https://github.com/opnsense/core/commit/e2a1e8c3128b88353a1743612372877f32c4cd76 ?

mimugmail commented 1 year ago

I tried the patch on FW1 but it still doesn't kill the process:

igb3: link state changed to DOWN
carp: 1@igb3_vlan100: MASTER -> INIT (hardware interface down)
carp: demoted by 240 to 0 (interface down)
igb3_vlan100: link state changed to DOWN
igb3_vlan310: link state changed to DOWN
carp: 4@igb3_vlan200: MASTER -> INIT (hardware interface down)
carp: demoted by 240 to 240 (interface down)
igb3_vlan200: link state changed to DOWN
carp: 2@igb3_vlan300: MASTER -> INIT (hardware interface down)
carp: demoted by 240 to 480 (interface down)
igb3_vlan300: link state changed to DOWN
arprequest_internal: cannot find matching address
arprequest_internal: cannot find matching address
arprequest_internal: cannot find matching address
arprequest_internal: cannot find matching address
arprequest_internal: cannot find matching address
arprequest_internal: cannot find matching address
arprequest_internal: cannot find matching address
arprequest_internal: cannot find matching address
arprequest_internal: cannot find matching address
arprequest_internal: cannot find matching address
arprequest_internal: cannot find matching address

system.log:

<13>1 2023-04-20T16:03:11+02:00 fw-01.domain.de opnsense 87362 - [meta sequenceId="1"] /usr/local/etc/rc.linkup: DEVD: Ethernet detached event for opt1(igb3_vlan100)
<13>1 2023-04-20T16:03:11+02:00 fw-01.domain.de opnsense 90549 - [meta sequenceId="2"] /usr/local/etc/rc.filter_configure: ROUTING: keeping current inet default gateway '1.2.3.4'
<13>1 2023-04-20T16:03:12+02:00 fw-01.domain.de opnsense 15364 - [meta sequenceId="3"] /usr/local/etc/rc.linkup: DEVD: Ethernet detached event for opt4(igb3_vlan310)
<13>1 2023-04-20T16:03:12+02:00 fw-01.domain.de opnsense 17762 - [meta sequenceId="4"] /usr/local/etc/rc.filter_configure: ROUTING: keeping current inet default gateway '1.2.3.4'
<13>1 2023-04-20T16:03:12+02:00 fw-01.domain.de opnsense 43109 - [meta sequenceId="5"] /usr/local/etc/rc.linkup: DEVD: Ethernet detached event for opt2(igb3_vlan200)
<13>1 2023-04-20T16:03:13+02:00 fw-01.domain.de opnsense 46876 - [meta sequenceId="6"] /usr/local/etc/rc.filter_configure: ROUTING: keeping current inet default gateway '1.2.3.4'
<13>1 2023-04-20T16:03:13+02:00 fw-01.domain.de opnsense 72507 - [meta sequenceId="7"] /usr/local/etc/rc.linkup: DEVD: Ethernet detached event for opt3(igb3_vlan300)
<13>1 2023-04-20T16:03:14+02:00 fw-01.domain.de opnsense 75649 - [meta sequenceId="8"] /usr/local/etc/rc.filter_configure: ROUTING: keeping current inet default gateway '1.2.3.4'
AdSchellevis commented 1 year ago

@mimugmail and then you probably need this https://github.com/opnsense/core/commit/c9e36ef230b359a41d85395b6050803447c3c9a1 on top, can you try that locally? I haven't tried it yet on my end (but sounds logical that it doesn't catch INIT now).

mimugmail commented 1 year ago

Yes, both patches stacked, great! :) Thank you @AdSchellevis, I'll run it on production there.

AdSchellevis commented 1 year ago

@mimugmail nice! I missed the devd event the first time :)

fichtner commented 1 year ago

Nice... close this then?

mimugmail commented 1 year ago

Housekeeping ...