strongswan / strongswan

strongSwan - IPsec-based VPN
https://www.strongswan.org
Other
2.15k stars 758 forks source link

Failing to Mount Encrypted NFS Volumes After System Reboot #2295

Open saicharansunkara-ibm opened 2 weeks ago

saicharansunkara-ibm commented 2 weeks ago

System (please complete the following information):

Describe the bug After a system reboot, I am experiencing failures when attempting to mount encrypted NFS volumes. The error message encountered is a connection timeout. The following journal logs are observed:

Jun 12 11:07:30 ubuntu charon-systemd[658]: unable to create IPv4 routing table rule
Jun 12 11:07:30 ubuntu charon-systemd[658]: unable to create IPv6 routing table rule

It appears that StrongSwan is attempting to create routing table rules before the network is fully online. This issue is intermittent but recurring. Adding the following directives to the StrongSwan systemd unit file seems to mitigate the issue:

Wants=network-online.target
After=network-online.target

This adjustment ensures that StrongSwan waits until the network is online before starting, preventing the observed errors.

To Reproduce Steps to reproduce the behavior:

  1. Configure an encrypted NFS volume.
  2. Reboot the system.
  3. Attempt to mount the NFS volume.

I would like to understand the root cause of this issue and determine if there are any additional steps or checks that can be implemented to ensure reliable mounting of encrypted NFS volumes after a system reboot.

tobiasbrunner commented 2 weeks ago

It appears that StrongSwan is attempting to create routing table rules before the network is fully online.

Hm, that's a basic kernel feature that's configured via Netlink. No idea why that would be related to systemd's network-online.target, which waits for IPs to be configured on all interfaces etc. Could there be some other issue? Are there any other errors logged? (There should be a more specific Netlink error logged even in older versions. Increasing the log level for knl to 3 might show more about the communication.)

Maybe adding After=network.target has an effect in case network-online.target is not pulled in (see below)?

This adjustment ensures that StrongSwan waits until the network is online before starting, preventing the observed errors.

There already is a After=network-online.target that should do this, I guess. However, the target might not be pulled in if nobody requests it (to not delay the boot). This is what the systemd docs say about network-online.target:

Note that normally, if no service requires it and if no remote mount point is configured, this target is not pulled into the boot, thus avoiding any delays during boot should the network not be available.

And they go on:

It is strongly recommended not to make use of this target too liberally: for example network server software should generally not pull this in (since server software generally is happy to accept local connections even before any routable network interface is up). Its primary purpose is network client software that cannot operate without network.

So whether to explicitly add that Wants= might depend on the use case. I'm not sure if it's a good idea to add it by default.

I would like to understand the root cause of this issue and determine if there are any additional steps or checks that can be implemented to ensure reliable mounting of encrypted NFS volumes after a system reboot.

Please go ahead and do that then.

saicharansunkara-ibm commented 1 week ago

You are correct about adding Wants=network-online.target and After=network-online.target to the systemd file not fully resolving the issue, as we have observed the problem even with these changes.

This issue is present in Ubuntu 22.04 with StrongSwan version 5.9.5 and Ubuntu 24.04 with StrongSwan version 5.9.13 as well.

One possible reason could be that we are installing StrongSwan and its dependencies from downloaded .deb files instead of using a package manager like apt. This method might not correctly configure all dependencies and system integration aspects, leading to such intermittent issues.

Would there be any specific configurations or additional steps recommended when installing from .deb files to ensure proper service initialization and network dependency handling?

tobiasbrunner commented 1 week ago

One possible reason could be that we are installing StrongSwan and its dependencies from downloaded .deb files instead of using a package manager like apt. This method might not correctly configure all dependencies and system integration aspects, leading to such intermittent issues.

Hm, I don't see why that should make a difference. What exactly is your rationale here?

Would there be any specific configurations or additional steps recommended when installing from .deb files to ensure proper service initialization and network dependency handling?

I'm not aware of any. It's a systemd thing how these services are ordered etc. But I still don't get why the daemon would be started so early that it can't create routing rules in the kernel via Netlink. That seems very strange. Did you get any more specific errors in the log?