interfaces: interface_configure($reload = true) and rc.linkup(start) are counter-productive

fichtner commented 1 year ago

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

[x] I have read the contributing guide lines at https://github.com/opnsense/core/blob/master/CONTRIBUTING.md
[x] I am convinced that my issue is new after having checked both open and closed issues at https://github.com/opnsense/core/issues?q=is%3Aissue

Describe the bug

Judging from #6671 and a couple of logs I got when Netmap is involved bringing devices down when starting up restarting connectivity restarting all services (despite nothing being available yet) and then doing this for each interface is wasting a lot of computing time and doesn't help to bring the system back into a workable state. Most of the time IPv6 is the one that fails to reload properly because if this (especially radvd on the LAN side breaking tracking).

To Reproduce

Suricata on IPS mode for WAN on startup brings down dhcp6c and restarts it, but blocks the main IPv6 renew event from going through correctly. On LAN it's slightly better, but tracking can break (radvd starting in the wrong spot in rc.linkup). PPPoE brings even more fun into the mix but I only have logs and nothing concrete.

Expected behavior

Ideally rc.linkup should not disrupt operation, but I can see this being an unfixable situation.

Describe alternatives you considered

Reducing the reload behaviour on rc.linkup and being able to batch the reload of attached services (see $reload = true switch)

Screenshots

N/A

Relevant log files

Private submissions by multiple users.

Additional context

Seems to have gotten worse since 23.7, but code has always been problematic since fork and not much has changed on it.

Environment

23.7.x

AdSchellevis commented 1 year ago

I think the root cause for most of this madness are services binding on non static addresses. The boot issue might be fixable, but pulling the plug on an interface with many vlans will likely still be problematic (and caused by the same thing).

Maybe the best first step is collecting the services which still depend on this behavior and assess if a common workaround can be identified (at least for the services we offer).

fichtner commented 9 months ago

Pressing issue with static addresses is fixed hereby. Another follow up ticket will be published soon.

opnsense / core

interfaces: interface_configure($reload = true) and rc.linkup(start) are counter-productive #6852