opnsense / core

OPNsense GUI, API and systems backend
https://opnsense.org/
BSD 2-Clause "Simplified" License
3.32k stars 740 forks source link

interfaces: interface_configure($reload = true) and rc.linkup(start) are counter-productive #6852

Closed fichtner closed 9 months ago

fichtner commented 1 year ago

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

Judging from #6671 and a couple of logs I got when Netmap is involved bringing devices down when starting up restarting connectivity restarting all services (despite nothing being available yet) and then doing this for each interface is wasting a lot of computing time and doesn't help to bring the system back into a workable state. Most of the time IPv6 is the one that fails to reload properly because if this (especially radvd on the LAN side breaking tracking).

To Reproduce

Suricata on IPS mode for WAN on startup brings down dhcp6c and restarts it, but blocks the main IPv6 renew event from going through correctly. On LAN it's slightly better, but tracking can break (radvd starting in the wrong spot in rc.linkup). PPPoE brings even more fun into the mix but I only have logs and nothing concrete.

Expected behavior

Ideally rc.linkup should not disrupt operation, but I can see this being an unfixable situation.

Describe alternatives you considered

Reducing the reload behaviour on rc.linkup and being able to batch the reload of attached services (see $reload = true switch)

Screenshots

N/A

Relevant log files

Private submissions by multiple users.

Additional context

Seems to have gotten worse since 23.7, but code has always been problematic since fork and not much has changed on it.

Environment

23.7.x

AdSchellevis commented 1 year ago

I think the root cause for most of this madness are services binding on non static addresses. The boot issue might be fixable, but pulling the plug on an interface with many vlans will likely still be problematic (and caused by the same thing).

Maybe the best first step is collecting the services which still depend on this behavior and assess if a common workaround can be identified (at least for the services we offer).

fichtner commented 9 months ago

Pressing issue with static addresses is fixed hereby. Another follow up ticket will be published soon.