Closed Daggolin closed 1 year ago
Which option was used to build the tunnels? The legacy tunnel one or the new connections?
The new connections. Currently in the process of migrating from legacy to the new connections. The legacy tunnels had a field to specify the SPD entries and those still work fine. The OPNsense instance still has several legacy tunnels running, but only the one running as new connection is using the VPN > IPsec > Security Policy Database
manual entries (cause the new connections don't have an easy to use field like the legacy connections do). The issue only affects the new connections, because the legacy instances still use the field.
The new one is more flexible as it can also hook multiple clients, but this issue can likely be triggered when using both in combination. If you remove the ipsec_configure_spd()
in the following line, does it work correctly then?
I'm not sure I can offer a 100% solid fix, but preventing this from happening when manual spd's aren't used in legacy shouldn't be an issue.
Yes, I had tested this in the meantime to confirm the observations and conclusions I initially posted.
I'm thinking maybe we can just remove the items attached to any of the legacy tunnels, when a tunnel is dropped and the reqid isn't used, the garbage left shouldn't hurt anyone.
@Daggolin looking at the ticket and the code again, I think you're reusing the same reqid
as being used in the legacy tunnel settings. According to the following code section it should only drop entries which belong to the legacy tunnel already:
If you use a higher reqid (or link the entry to the correct child) it should work as expected as far as I can see.
@Daggolin looking at the ticket and the code again, I think you're reusing the same
reqid
as being used in the legacy tunnel settings.
As I said in the inital description of the issue, the shown reqid
for the installed manual SPD entries is 1
, which is the ID of another tunnel. So yes, a reuse is very likely happening under the hood, but it's not explicitly set by me to collide. I may have worded that unclear in my initial description of the issue. I have now manually checked swanctl.conf
and swanctl -l
to confirm this.
The new connection does not have an explicit reqid
set for its child on the WebUI, because the help text suggests that OPNsense or swanctl is going to automatically pick incremental ones (This might be helpful in some scenarios, like route based tunnels (VTI), but works only if each CHILD_SA configuration is instantiated not more than once. The default uses dynamic reqids, allocated incrementally
). Looking at the swanctl.conf
file generated by OPNsense it seems that the new connection doesn't get an explicit reqid
assigned by OPNsense, but it's rather left unset which implicates the default-value of 0
for reqid
, causing swantctl to allocate a reqid
incrementally on start. In our case - according to swanctl -l
- this means that the new connection's child got the reqid
1
at swanctl restart/reconnect, colliding with the legacy tunnel that OPNsense explicity assigned the same id to. It seems the existence of legacy tunnels is enough to make the automatic value unreliable.
Manually assigning a high reqid
to the child of the new tunnel solves the issue of SPD entries getting deleted. However, this still means that the use of automatically incrementally allocated reqid
in new connections does not work reliable if you have even a single legacy connection configured (even if it’s disabled the WebUI might draw the wrong conclusions, haven’t tested that). Maybe OPNsense could handle those incrementally allocated reqid
internally, taking legacy tunnels into consideration and explicity writing a reqid
to the swantctl.conf file instead of leaving it up to swanctl by not setting it. I honestly think that mixed setups with legacy connections and new connections existing side-by-side are going to be quite common for a while when users migrate their existing legacy connections to new connections one by one.
I'm afraid we can't fix this, legacy pins the reqids
in which case you will have to pin them in the connections as well. We can make a note somewhere in the documentation that there are constraints when using both options, but there's not much else we can do.
The reason for not pinning reqid's in connections in quite simple, when multiple clients connect, they all will use a unique reqid at strongswan's end unless you enforce anything. One other option you could try is to replace the lower numbers in the tunnels manually (they're stored in the config.xml) with higher ones so their less likely to collide.
The reason for not pinning reqid's in connections in quite simple, when multiple clients connect, they all will use a unique reqid at strongswan's end unless you enforce anything.
I was afraid something like that was the reason why you decided to pass no value (0) into the swanctl config in the first place.
One other option you could try is to replace the lower numbers in the tunnels manually (they're stored in the config.xml) with higher ones so their less likely to collide.
I had already considered this option, but I don’t think it offers any advantages over assigning high reqid
to the new connections within the setup I am working with, but thanks for the suggestion.
We can make a note somewhere in the documentation that there are constraints when using both options, but there's not much else we can do.
It would be very helpful if the help text of the reqid
field of the new connections would explicitly mention that using both, legacy and new connections, can cause issues and users should always assign a unique value themselves if they still have legacy tunnels set up.
Alternatively, as OPNsense explicitly assigns the values for legacy tunnels, OPNsense could automatically assign high numbers to them, but that would just shift the issue elsewhere and if you would automatically reassign existing config values during an update it could mess with users who already set manual values, potentially breaking their setup. So that‘s not a good option really.
As you said this is not something you can fix currently and as possible workarounds exist, feel free to close this issue if you don‘t need any infos or input from me unless you want to keep it around until documentation and/or possible button adjustments are done.
Let me think a bit about wording and close the issue later. I do agree explaining the constraints would be helpful, although our documentation is likely a better place to do that.
Important notices
Before you add a new report, we ask you kindly to acknowledge the following:
Describe the bug
When triggering an action on the web interface that leads to a new
swanctl.conf
file being written and the config being reloaded any IPsec tunnel relying on manual SPD entries (VPN > IPsec > Security Policy Database
) stops working.Additional context
I am not sure where the core issue originates from, but I have noticed the following:
Reqid
1
, which is not the ID of the tunnel they belong toswanctl.conf
file is written (for instance by clicking the apply button onVPN > IPsec > Connections
) the functionipsec_configure_spd
generates thespddelete
instructions for the manual entries and stores them forregid
1
, thus triggering their deletion when the commands are written to file and applied using/sbin/setkey -f
; this seems to be what leads to the loss of connection/usr/local/etc/swanctl/reqid_events.conf
and get applied by theipsec/updown_event.py
script on such an event, however the config reload doesn't trigger such an event so wheneveripsec_configure_spd
is called the manual entries get removed and not reapplied until the tunnel is reconnected (triggering a new event)The manual SPD entries used in our setup don't have an explicit
Reqid
set, they are associated to the phase 2 using theConnection child
dropdown menu.Expected behavior
Either
ipsec_configure_spd
should skip the removal of manual SPD entries for active phase 2 connections or it should invoke the same logic asipsec/updown_event.py
to restore them after their deletion.The active IPsec tunnels that rely on manual SPD entries should keep functioning when the IPsec config gets updated. Adding a new PSK (
VPN > IPsec > Pre-Shared Keys
) or enabling/disabling an IPsec tunnel (VPN > IPsec > Connections
;VPN > IPsec > Tunnel Settings [legacy]
) should not remove the installed manual SPD entries and break tunnels.Environment
Versions:
CPU type: