opnsense / core

OPNsense GUI, API and systems backend
https://opnsense.org/
BSD 2-Clause "Simplified" License
3.27k stars 727 forks source link

Reliably clear SIP UDP states on PPPOE WAN IP change #4652

Closed wkochFPV closed 1 year ago

wkochFPV commented 3 years ago

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Is your feature request related to a problem? Please describe.

These UDP states of SIP clients (phones, gateways and asterisk servers as SIP clients) are not flushed when the WAN IP of PPPOE Dial-Up-Connections (possibly also for DHCP connections, but not tested) changes:

all | udp | 95.118.XXX.XXX:50745 (192.168.3.150:5160) -> XXX.120.186.XXX:5060 | MULTIPLE:MULTIPLE |   all | udp | 95.118.XXX.XXX:56393 (192.168.3.150:5160) -> XXX.185.37.XX:5060 | MULTIPLE:MULTIPLE |   all | udp | 95.118.XXX.XXX:43546 (192.168.3.150:5160) -> XXX.10.79.XXX:5060 | MULTIPLE:MULTIPLE |   all | udp | 95.118.XXX.XXX:22721 (192.168.50.40:5070) -> XXX.185.37.XX:5060 | MULTIPLE:MULTIPLE |   all | udp | 95.118.XXX.XXX:51082 (192.168.50.10:5060) -> XXX.10.79.XXX:5060 | MULTIPLE:MULTIPLE |   all | udp | 95.118.XXX.XXX:60499 (192.168.50.20:5060) -> XXX.185.37.XX:5060 | MULTIPLE:MULTIPLE

These stale states leads to all REGISTER (and other) SIP messages remaining unanswered and SIP connectivity of all devices being dropped without possibility for the devices to reconnect. Even after a long time, these entries are not cleared, leading to permanent VOIP connectivity loss.

Manually clearing the entries (via Firewall: Diagnostics: States Dump) resolves the problem immediately.

Enabling "Dynamic state reset" (Firewall: Settings: Advanced) helps to clear these states automatically and allows all SIP clients to reconnect on WAN IP change. Unfortunately, this option clears the entire state table, which would be no problem when using a single WAN interface. Using Multi-WAN, however, this option also interrupts all open connections on all other WAN interfaces. In our setup this kills all RDP remote sessions of home workers and interferes with automated remote backup systems.

Similar problem descriptions: for OPNSENSE: https://forum.opnsense.org/index.php?topic=10385.0 for PFSENSE (with potential workaround): https://forum.netgate.com/topic/16968/sip-registration-timeout-due-to-stale-entry-in-pfsense-state-table for PFSENSE: https://redmine.pfsense.org/issues/8

The problem is very common here in Germany: many VDSL PPPOE connections are force closed by the providers once a day to enforce dynamic IP change. Fixed IP is often not available and providers are not willing to disable forced disconnects.

Describe the solution you like

Reliably kill these states on WAN IP change in first place without using "Dynamic state reset" option. Please make sure, this is also done on WAN failover in a multi WAN environment (not tested).

Describe alternatives you considered

None.

Additional context

THANK YOU VERY MUCH FOR YOUR EFFORTS!

Best regards, Walter

wkochFPV commented 3 years ago

Suggestion for a workaround: after normal clearing of states check for remaining states (MULTIPLE:MULTIPLE, NAT) containing the old WAN IP and kill those individually with the pfctl -k label -k (or similar) command.

OPNsense-bot commented 3 years ago

This issue has been automatically timed-out (after 180 days of inactivity).

For more information about the policies for this repository, please read https://github.com/opnsense/core/blob/master/CONTRIBUTING.md for further details.

If someone wants to step up and work on this issue, just let us know, so we can reopen the issue and assign an owner to it.

StephanTexxpro commented 1 year ago

The problem still seems to exist in 23.1, please see my report at https://forum.opnsense.org/index.php?topic=32116.0.

fichtner commented 1 year ago

I’ll take it. Some other people require a tweak now that dynamic state reset is gone.

fichtner commented 1 year ago

This might be fixed via https://github.com/opnsense/core/commit/f57f07997509c in 23.1.2

fichtner commented 1 year ago

Closing this due to lack of feedback/likely fix a while ago.