opnsense / core

OPNsense GUI, API and systems backend
https://opnsense.org/
BSD 2-Clause "Simplified" License
3.28k stars 727 forks source link

Interface not brought down / Routes not removed when disabling interface in Web UI #6134

Closed Keno closed 1 year ago

Keno commented 1 year ago

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

After turning off an interface in the Web UI, it is still up and routes are assigned to it. This causes users behind the firewall to experience issues if their packets are routed over the supposedly disabled interface.

To Reproduce

My firewall had two WAN gateways:

WANFIBER is (currently) ipv4 only, WANCABLE is dual stack. I had set up a gateway group for the two IPv4 gateways, but never did much with it (not sure this is relevant, but including just in case).

The WANCABLE isp then cut off service and is dropping any packets on this interface. I went into the settings, removed the gateway group and disabled the interface:

Screen Shot 2022-11-11 at 5 14 21 PM

Expected behavior

I expected this to bring down the interface and remove the routes. However, the interface is still up and routes still present:

root@OPNsense:~ # ifconfig
igc0: flags=8863<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> metric 0 mtu 1500
    description: WANCABLE
    options=4802028<VLAN_MTU,JUMBO_MTU,WOL_MAGIC,NOMAP>
    ether 64:62:66:21:55:50
    inet6 fe80::6662:66ff:fe21:5550%igc0 prefixlen 64 scopeid 0x1
    inet6 2601:189:8080:6dd:6662:66ff:fe21:5550 prefixlen 64 autoconf
    inet6 2601:189:8080:6dd::7b9 prefixlen 128
    inet 10.1.10.154 netmask 0xffffff00 broadcast 10.1.10.255
    media: Ethernet autoselect (2500Base-T <full-duplex>)
    status: active
    nd6 options=23<PERFORMNUD,ACCEPT_RTADV,AUTO_LINKLOCAL>
# netstat -r
Routing tables

Internet:
Destination        Gateway            Flags     Netif Expire
default            c-50-170-44-209.un UGS        igc3
one.one.one.one    c-50-170-44-209.un UGHS       igc3
one.one.one.one    c-50-170-44-209.un UGHS       igc3
10.0.0.0/16        link#10            U        vlan02
10.0.0.1           link#10            UHS         lo0
10.1.10.0/24       link#1             U          igc0
10.1.10.154        link#1             UHS         lo0
50.170.44.208/30   link#4             U          igc3
c-50-170-44-209.un link#4             UHS        igc3
c-50-170-44-210.un link#4             UHS         lo0
localhost          link#6             UH          lo0
SUSIlanding-chic-v 10.1.10.1          UGHS       igc0
SUSIlanding-denv-v 10.1.10.1          UGHS       igc0
172.16.0.0/12      link#11            U        vlan03
172.16.0.1         link#11            UHS         lo0
192.168.1.0/24     link#2             U          igc1
OPNsense           link#2             UHS         lo0
192.168.2.0/24     link#3             U          igc2
192.168.2.1        link#3             UHS         lo0
192.168.3.0/24     link#9             U        vlan01
192.168.3.1        link#9             UHS         lo0

Internet6:
Destination        Gateway            Flags     Netif Expire
default            fe80::2cb:7aff:fe5 UG         igc0
localhost          link#6             UHS         lo0
cdns01.comcast.net fe80::2cb:7aff:fe5 UGHS       igc0
cdns02.comcast.net fe80::2cb:7aff:fe5 UGHS       igc0
2601:189:8080:6dd: link#1             U          igc0
2601:189:8080:6dd: link#1             UHS         lo0
2601:189:8080:6dd: link#1             UHS         lo0
fe80::%igc0/64     link#1             U          igc0
fe80::6662:66ff:fe link#1             UHS         lo0
fe80::%igc1/64     link#2             U          igc1
fe80::6662:66ff:fe link#2             UHS         lo0
fe80::%lo0/64      link#6             U           lo0
fe80::1%lo0        link#6             UHS         lo0
fe80::%vlan01/64   link#9             U        vlan01
fe80::6662:66ff:fe link#9             UHS         lo0
fe80::%vlan02/64   link#10            U        vlan02
fe80::6662:66ff:fe link#10            UHS         lo0
fe80::%vlan03/64   link#11            U        vlan03
fe80::6662:66ff:fe link#11            UHS         lo0

Describe alternatives you considered

A clear and concise description of any alternative solutions or workaround you considered.

Screenshots

Relevant log files

2022-11-11T22:25:31 | Error | opnsense | /interfaces.php: Warning! dhcpd_radvd_configure(auto) found no suitable IPv6 address on igc1 |  
2022-11-11T22:25:28 | Error | opnsense | /interfaces.php: The APMgmt_GWv4 monitor address is empty, skipping. |  
2022-11-11T22:25:28 | Error | opnsense | /interfaces.php: The WANCABLE_DHCP monitor address is empty, skipping. |  
2022-11-11T22:25:28 | Error | opnsense | /interfaces.php: The WANCABLE_DHCP6 monitor address is empty, skipping. |  
2022-11-11T22:25:28 | Error | opnsense | /interfaces.php: ROUTING: skipping IPv6 default route |  
2022-11-11T22:25:28 | Error | opnsense | /interfaces.php: ROUTING: IPv6 default gateway set to wan |  
2022-11-11T22:25:28 | Error | opnsense | /interfaces.php: ROUTING: skipping IPv4 default route |  
2022-11-11T22:25:28 | Error | opnsense | /interfaces.php: ROUTING: IPv4 default gateway set to opt2 |  
2022-11-11T22:25:28 | Error | opnsense | /interfaces.php: ROUTING: entering configure using 'opt5'

Additional context

I should also note that hitting "Apply changes" after disabling the interface hangs in reload forever, so it's possible that it was actually trying to bring the interface down but hitting some sort of other issue that's causing it to hang.

Environment

OPNsense 22.7.6-amd64FreeBSD 13.1-RELEASE-p2OpenSSL 1.1.1q 5 Jul 2022
fichtner commented 1 year ago

What IPv4/IPv6 configuration does WANCABLE have? Because in truth it cuts all addresses and services from the running interface configuration so that routing should not be able to apply anyway. E.g. if 10.1.10.154 is the static address set from the GUI it would have been stripped by the apply so that hasn't gone through indeed.

I also think the system hasn't done "down" flag for present interfaces in decades while also not causing the apparent issues described.

Cheers, Franco

Keno commented 1 year ago

What IPv4/IPv6 configuration does WANCABLE have?

DHCP on both IPv4 and IPv6. Before the isp cut service, it got an actual publicly routable IP on this interface. I'm assuming switching the dhcpv4 to 10.1.10.154 is the ISP's way of attempting to prevent clients from routing over the interface if it's disabled - I'm not as familiar with IPv6, so I don't know to what extent the IPv6 configuration is sensible. It's perfectly plausible to me that the isp here is doing bad things. That said, I of course expected turning off the interface in the GUI to have it just ignore whatever is going on on that interface and in any case, the "Apply Changes" hanging suggests that OPNSense is confused about something.

fichtner commented 1 year ago

I've tested this on a WAN with DHCP and it works as expected stripping the addresses, which would make the interface not interfere with anything else. There is probably an issue that blocked the execution, but I don't know what and why so I don't know how to help narrow this down. It's probably only tangentially related to the issue reported, perhaps a service hanging while trying to use the defunct WAN before bringing it down. I don't know.

Cheers, Franco

Keno commented 1 year ago

At this point the web UI stopped working entirely and gives 503s.

Here's the lighthttp logs:

<27>1 2022-11-15T01:41:18+00:00 OPNsense.localdomain lighttpd 37060 - [meta sequenceId="13"] (gw_backend.c.360) gw-server re-enabled: unix:/tmp/php-fastcgi.socket-1  0 /tmp/php-fastcgi.socket
<27>1 2022-11-15T01:41:18+00:00 OPNsense.localdomain lighttpd 37060 - [meta sequenceId="14"] (gw_backend.c.360) gw-server re-enabled: unix:/tmp/php-fastcgi.socket-0  0 /tmp/php-fastcgi.socket
<27>1 2022-11-15T01:41:37+00:00 OPNsense.localdomain lighttpd 37060 - [meta sequenceId="15"] (gw_backend.c.283) establishing connection failed: socket: unix:/tmp/php-fastcgi.socket-1: Connection refused
<27>1 2022-11-15T01:41:37+00:00 OPNsense.localdomain lighttpd 37060 - [meta sequenceId="16"] (gw_backend.c.299) backend error; we'll disable for 1secs and send the request to another backend instead:load: 8
<27>1 2022-11-15T01:41:37+00:00 OPNsense.localdomain lighttpd 37060 - [meta sequenceId="17"] (gw_backend.c.283) establishing connection failed: socket: unix:/tmp/php-fastcgi.socket-0: Connection refused
<27>1 2022-11-15T01:41:37+00:00 OPNsense.localdomain lighttpd 37060 - [meta sequenceId="18"] (gw_backend.c.299) backend error; we'll disable for 1secs and send the request to another backend instead:load: 8
<27>1 2022-11-15T01:41:37+00:00 OPNsense.localdomain lighttpd 37060 - [meta sequenceId="19"] (gw_backend.c.993) all handlers for /api/api.php? on .php are down.
<27>1 2022-11-15T01:41:40+00:00 OPNsense.localdomain lighttpd 37060 - [meta sequenceId="20"] (gw_backend.c.360) gw-server re-enabled: unix:/tmp/php-fastcgi.socket-1  0 /tmp/php-fastcgi.socket
<27>1 2022-11-15T01:41:40+00:00 OPNsense.localdomain lighttpd 37060 - [meta sequenceId="21"] (gw_backend.c.360) gw-server re-enabled: unix:/tmp/php-fastcgi.socket-0  0 /tmp/php-fastcgi.socket

I guess php stopped processing connections? I see a bunch of things running and the igc0 reference (WANCABLE) is suspicious of course:

root      853   0.0  0.5   57084   37392  -  S    Fri18       9:35.57 /usr/local/bin/php /usr/local/etc/rc.linkup start igc0
root     1896   0.0  0.5   57084   37300  -  S    Sat07       8:35.82 /usr/local/bin/php /usr/local/etc/rc.linkup start igc0
root    11823   0.0  0.5   57088   37844  -  S    Fri18       9:35.98 /usr/local/bin/php /usr/local/etc/rc.newwanip igc0
root    13464   0.0  0.4   62460   34548  -  S    Fri22       9:12.46 /usr/local/bin/php-cgi
root    16456   0.0  0.5   57088   37900  -  S    Sat07       8:30.91 /usr/local/bin/php /usr/local/etc/rc.newwanip igc0
root    25924   0.0  0.4   62528   34396  -  S    Fri18       9:22.33 /usr/local/bin/php-cgi
root    26992   0.0  0.5   57088   37916  -  S    Sat07       8:33.88 /usr/local/bin/php /usr/local/etc/rc.newwanipv6 igc0
root    34303   0.0  0.4   62460   34740  -  S    Fri01      10:44.04 /usr/local/bin/php-cgi
root    35567   0.0  0.4   62524   35040  -  S    Fri21       9:14.72 /usr/local/bin/php-cgi
root    37134   0.0  0.2   48756   20476  -  Is    1Nov22     0:00.22 /usr/local/bin/php-cgi
root    37287   0.0  0.3   48756   20704  -  Is    1Nov22     0:00.39 /usr/local/bin/php-cgi
root    38484   0.0  0.4   57084   36276  -  S    Sat07       8:34.10 /usr/local/bin/php /usr/local/etc/rc.linkup start igc0
root    38581   0.0  0.5   57088   37904  -  S    Sat07       8:35.44 /usr/local/bin/php /usr/local/etc/rc.newwanip igc0
root    40307   0.0  0.5   57088   37856  -  S    Fri18       9:36.87 /usr/local/bin/php /usr/local/etc/rc.newwanip igc0
root    43997   0.0  0.5   57088   37840  -  S    Fri18       9:32.67 /usr/local/bin/php /usr/local/etc/rc.newwanipv6 igc0
root    58220   0.0  0.5   57088   37904  -  S    Sat07       8:35.05 /usr/local/bin/php /usr/local/etc/rc.newwanip igc0
root    60803   0.0  0.4   57084   36056  -  S    Fri21       9:21.92 /usr/local/bin/php /usr/local/sbin/pluginctl -s unbound restart
root    61011   0.0  0.5   57084   37316  -  S    Sat07       8:34.45 /usr/local/bin/php /usr/local/etc/rc.linkup start igc0
root    64300   0.0  0.5   57088   37908  -  S    Sat07       8:33.42 /usr/local/bin/php /usr/local/etc/rc.newwanipv6 igc0
root    64590   0.0  0.5   57088   37840  -  S    Fri18       9:33.15 /usr/local/bin/php /usr/local/etc/rc.newwanipv6 igc0
root    74346   0.0  0.5   57088   37848  -  S    Fri18       9:33.64 /usr/local/bin/php /usr/local/etc/rc.newwanipv6 igc0
root    88044   0.0  0.4   57084   36240  -  S    Fri18       9:34.06 /usr/local/bin/php /usr/local/etc/rc.linkup start igc0
root    90035   0.0  0.5   57088   37900  -  S    Sat07       8:34.62 /usr/local/bin/php /usr/local/etc/rc.newwanipv6 igc0
root    91654   0.0  0.4   52984   31368  -  S    Fri21       9:18.06 /usr/local/bin/php /usr/local/sbin/pluginctl -s unbound stop
root    91853   0.0  0.5   57088   37952  -  S    Fri18       9:34.38 /usr/local/bin/php /usr/local/etc/rc.newwanip igc0
root    98040   0.0  0.4   62460   34476  -  S    Fri20       9:18.00 /usr/local/bin/php-cgi
root    98610   0.0  0.5   57084   37252  -  S    Fri18       9:31.01 /usr/local/bin/php /usr/local/etc/rc.linkup start igc0
root    99652   0.0  0.4   62464   33908  -  S    Fri21       9:19.33 /usr/local/bin/php-cgi

Are there debug logs for any of these processes somewhere that might help track down what the issue is?

AdSchellevis commented 1 year ago

at a first glance I would expect your intel (igc) card is in a constant state of reconnecting/restarting, hence the amount of newwan events. dmesg might provide more info, but hardware/drivers are highly likely playing a role here.

fichtner commented 1 year ago

multiple rc.linkup start igc0 sounds like a bigger issue indeed. IPv4 and IPv6 scripts hammering away are not very helpful either. But I'm curious as to where they all seem to stop... doing ifconfig on igc0 perhaps? Which would mean the card is jammed and the system is not tailored to handle an endless block of ifconfig calls.

Keno commented 1 year ago

Let me see if I can get backtraces from any of these processes.

fichtner commented 1 year ago

If the system is still running a ps with tree view might already give proper hints:

# ps auxd
Keno commented 1 year ago
root    16827   0.0  0.3   36256   22444  -  Is   Fri00       0:03.26 |-- /usr/local/bin/python3 /usr/local/opnsense/service/configd.py (python3.9)
root    17737   0.0  0.8  140676   63384  -  I    Fri00       5:51.27 | `-- /usr/local/bin/python3 /usr/local/opnsense/service/configd.py console (python3.9)
root      853   0.0  0.5   57104   37404  -  TX   Fri18       9:36.88 |   |-- /usr/local/bin/php /usr/local/etc/rc.linkup start igc0
root      884   0.0  0.0   13504    2436  -  I    Fri21       0:00.01 |   |-- /bin/sh /usr/local/opnsense/scripts/unbound/cache.sh dump
root     1591   0.0  0.1   23144   10652  -  I    Fri21       0:00.01 |   | |-- /usr/local/sbin/unbound-control -c /var/unbound/unbound.conf dump_cache
root     1938   0.0  0.0   13388    2452  -  I    Fri21       0:00.00 |   | `-- gzip
root     1896   0.0  0.5   57084   37300  -  S    Sat07       8:37.72 |   |-- /usr/local/bin/php /usr/local/etc/rc.linkup start igc0
root     9333   0.0  0.2   24000   12544  -  I    Fri21       0:00.20 |   |-- /usr/local/bin/python3 /usr/local/opnsense/scripts/unbound/wrapper.py -s (python3.9)
root    12304   0.0  0.1   23144   10652  -  I    Fri21       0:00.02 |   | `-- /usr/local/sbin/unbound-control -c /var/unbound/unbound.conf stats_noreset
root    11823   0.0  0.5   57088   37844  -  S    Fri18       9:37.86 |   |-- /usr/local/bin/php /usr/local/etc/rc.newwanip igc0
root    13207   0.0  0.0   13504    2172  -  I    Fri01       0:00.01 |   |-- /bin/sh /usr/local/opnsense/scripts/unbound/cache.sh dump
root    14401   0.0  0.1   23144   10440  -  I    Fri01       0:00.02 |   | |-- /usr/local/sbin/unbound-control -c /var/unbound/unbound.conf dump_cache
root    14410   0.0  0.0   13388    2088  -  I    Fri01       0:00.00 |   | `-- gzip
root    14552   0.0  0.0   13504    2440  -  I    Sat07       0:00.02 |   |-- /bin/sh /usr/local/opnsense/scripts/unbound/cache.sh dump
root    15682   0.0  0.1   23144   10656  -  I    Sat07       0:00.02 |   | |-- /usr/local/sbin/unbound-control -c /var/unbound/unbound.conf dump_cache
root    15699   0.0  0.0   13388    2488  -  I    Sat07       0:00.00 |   | `-- gzip
root    16456   0.0  0.5   57088   37900  -  S    Sat07       8:32.78 |   |-- /usr/local/bin/php /usr/local/etc/rc.newwanip igc0
root    23106   0.0  0.0   13504    2440  -  I    Sat07       0:00.02 |   |-- /bin/sh /usr/local/opnsense/scripts/unbound/cache.sh dump
root    24260   0.0  0.1   23144   10656  -  I    Sat07       0:00.02 |   | |-- /usr/local/sbin/unbound-control -c /var/unbound/unbound.conf dump_cache
root    24290   0.0  0.0   13388    2488  -  I    Sat07       0:00.00 |   | `-- gzip
root    26992   0.0  0.5   57088   37916  -  S    Sat07       8:35.78 |   |-- /usr/local/bin/php /usr/local/etc/rc.newwanipv6 igc0
root    27002   0.0  0.2   24000   12508  -  I    Fri20       0:00.19 |   |-- /usr/local/bin/python3 /usr/local/opnsense/scripts/unbound/wrapper.py -s (python3.9)
root    29348   0.0  0.1   23144   10616  -  I    Fri20       0:00.02 |   | `-- /usr/local/sbin/unbound-control -c /var/unbound/unbound.conf stats_noreset
root    29632   0.0  0.0   13504    2432  -  I    Sat07       0:00.02 |   |-- /bin/sh /usr/local/opnsense/scripts/unbound/cache.sh dump
root    32196   0.0  0.1   23144   10656  -  I    Sat07       0:00.02 |   | |-- /usr/local/sbin/unbound-control -c /var/unbound/unbound.conf dump_cache
root    32351   0.0  0.0   13388    2488  -  I    Sat07       0:00.00 |   | `-- gzip
root    38484   0.0  0.4   57084   36276  -  S    Sat07       8:35.99 |   |-- /usr/local/bin/php /usr/local/etc/rc.linkup start igc0
root    38581   0.0  0.5   57088   37904  -  S    Sat07       8:37.31 |   |-- /usr/local/bin/php /usr/local/etc/rc.newwanip igc0
root    40307   0.0  0.5   57088   37856  -  S    Fri18       9:38.76 |   |-- /usr/local/bin/php /usr/local/etc/rc.newwanip igc0
root    43997   0.0  0.5   57088   37840  -  S    Fri18       9:34.55 |   |-- /usr/local/bin/php /usr/local/etc/rc.newwanipv6 igc0
root    45014   0.0  0.2   24000   12544  -  I    Fri21       0:00.19 |   |-- /usr/local/bin/python3 /usr/local/opnsense/scripts/unbound/wrapper.py -s (python3.9)
root    47258   0.0  0.1   23144   10652  -  I    Fri21       0:00.01 |   | `-- /usr/local/sbin/unbound-control -c /var/unbound/unbound.conf stats_noreset
root    46805   0.0  0.0   13504    2356  -  I    Fri20       0:00.02 |   |-- /bin/sh /usr/local/opnsense/scripts/unbound/cache.sh dump
root    47698   0.0  0.1   23144   10616  -  I    Fri20       0:00.01 |   | |-- /usr/local/sbin/unbound-control -c /var/unbound/unbound.conf dump_cache
root    47814   0.0  0.0   13388    2452  -  I    Fri20       0:00.00 |   | `-- gzip
root    47741   0.0  0.0   13504    2356  -  I    Fri18       0:00.01 |   |-- /bin/sh /usr/local/opnsense/scripts/unbound/cache.sh dump
root    48484   0.0  0.1   23144   10616  -  I    Fri18       0:00.01 |   | |-- /usr/local/sbin/unbound-control -c /var/unbound/unbound.conf dump_cache
root    48552   0.0  0.0   13388    2452  -  I    Fri18       0:00.00 |   | `-- gzip
root    48681   0.0  0.0   13504    2428  -  I    Fri22       0:00.02 |   |-- /bin/sh /usr/local/opnsense/scripts/unbound/cache.sh dump
root    50056   0.0  0.1   23144   10652  -  I    Fri22       0:00.01 |   | |-- /usr/local/sbin/unbound-control -c /var/unbound/unbound.conf dump_cache
root    50385   0.0  0.0   13388    2476  -  I    Fri22       0:00.00 |   | `-- gzip
root    58220   0.0  0.5   57088   37904  -  S    Sat07       8:36.94 |   |-- /usr/local/bin/php /usr/local/etc/rc.newwanip igc0
root    60803   0.0  0.4   57084   36056  -  S    Fri21       9:23.80 |   |-- /usr/local/bin/php /usr/local/sbin/pluginctl -s unbound restart
root    61011   0.0  0.5   57084   37316  -  S    Sat07       8:36.33 |   |-- /usr/local/bin/php /usr/local/etc/rc.linkup start igc0
root    64300   0.0  0.5   57088   37908  -  S    Sat07       8:35.31 |   |-- /usr/local/bin/php /usr/local/etc/rc.newwanipv6 igc0
root    64590   0.0  0.5   57088   37840  -  S    Fri18       9:35.03 |   |-- /usr/local/bin/php /usr/local/etc/rc.newwanipv6 igc0
root    68840   0.0  0.2   24000   12544  -  I    Fri21       0:00.20 |   |-- /usr/local/bin/python3 /usr/local/opnsense/scripts/unbound/wrapper.py -s (python3.9)
root    70770   0.0  0.1   23144   10652  -  I    Fri21       0:00.02 |   | `-- /usr/local/sbin/unbound-control -c /var/unbound/unbound.conf stats_noreset
root    74346   0.0  0.5   57088   37848  -  S    Fri18       9:35.52 |   |-- /usr/local/bin/php /usr/local/etc/rc.newwanipv6 igc0
root    75291   0.0  0.0   13504    2440  -  I    Sat07       0:00.02 |   |-- /bin/sh /usr/local/opnsense/scripts/unbound/cache.sh dump
root    76426   0.0  0.1   23144   10656  -  I    Sat07       0:00.01 |   | |-- /usr/local/sbin/unbound-control -c /var/unbound/unbound.conf dump_cache
root    76564   0.0  0.0   13388    2488  -  I    Sat07       0:00.00 |   | `-- gzip
root    83310   0.0  0.0   13504    2348  -  I    Fri18       0:00.02 |   |-- /bin/sh /usr/local/opnsense/scripts/unbound/cache.sh dump
root    85467   0.0  0.1   23144   10616  -  I    Fri18       0:00.01 |   | |-- /usr/local/sbin/unbound-control -c /var/unbound/unbound.conf dump_cache
root    85502   0.0  0.0   13388    2452  -  I    Fri18       0:00.00 |   | `-- gzip
root    85840   0.0  0.0   13504    2356  -  I    Fri18       0:00.02 |   |-- /bin/sh /usr/local/opnsense/scripts/unbound/cache.sh dump
root    87318   0.0  0.1   23144   10616  -  I    Fri18       0:00.01 |   | |-- /usr/local/sbin/unbound-control -c /var/unbound/unbound.conf dump_cache
root    87591   0.0  0.0   13388    2452  -  I    Fri18       0:00.00 |   | `-- gzip
root    86429   0.0  0.0   13504    2356  -  I    Fri18       0:00.01 |   |-- /bin/sh /usr/local/opnsense/scripts/unbound/cache.sh dump
root    87393   0.0  0.1   23144   10616  -  I    Fri18       0:00.01 |   | |-- /usr/local/sbin/unbound-control -c /var/unbound/unbound.conf dump_cache
root    87640   0.0  0.0   13388    2452  -  I    Fri18       0:00.00 |   | `-- gzip
root    88044   0.0  0.4   57084   36240  -  S    Fri18       9:35.95 |   |-- /usr/local/bin/php /usr/local/etc/rc.linkup start igc0
root    89184   0.0  0.2   24000   12544  -  I    Fri21       0:00.23 |   |-- /usr/local/bin/python3 /usr/local/opnsense/scripts/unbound/wrapper.py -s (python3.9)
root    91679   0.0  0.1   23144   10652  -  I    Fri21       0:00.01 |   | `-- /usr/local/sbin/unbound-control -c /var/unbound/unbound.conf stats_noreset
root    90035   0.0  0.5   57088   37900  -  S    Sat07       8:36.49 |   |-- /usr/local/bin/php /usr/local/etc/rc.newwanipv6 igc0
root    91654   0.0  0.4   52984   31368  -  S    Fri21       9:19.94 |   |-- /usr/local/bin/php /usr/local/sbin/pluginctl -s unbound stop
root    91853   0.0  0.5   57088   37952  -  S    Fri18       9:36.26 |   |-- /usr/local/bin/php /usr/local/etc/rc.newwanip igc0
root    93798   0.0  0.0   13504    2440  -  I    Sat07       0:00.02 |   |-- /bin/sh /usr/local/opnsense/scripts/unbound/cache.sh dump
root    94689   0.0  0.1   23144   10656  -  I    Sat07       0:00.01 |   | |-- /usr/local/sbin/unbound-control -c /var/unbound/unbound.conf dump_cache
root    95018   0.0  0.0   13388    2488  -  I    Sat07       0:00.00 |   | `-- gzip
root    98610   0.0  0.5   57084   37252  -  S    Fri18       9:32.88 |   `-- /usr/local/bin/php /usr/local/etc/rc.linkup start igc0
Keno commented 1 year ago

Ok, here's the backtrace for process 853 (/usr/local/bin/php /usr/local/etc/rc.linkup start igc0 - let me know if you'd like to see a different one). For the record, I got the backtrace by doing:

$ touch /tmp/emerg_debug_log
$ lldb -p 853
(lldb) expr (int)open("/tmp/emerg_debug_log", 0x2)
4
(lldb) expr (void)dup2(4, 1)
(lldb) expr void *$zp = (void*)malloc(1000)
(lldb) expr (void)zend_fetch_debug_backtrace($zp, 0, 0, 0)
(lldb) expr (void)php_var_dump($zp, 0)

The backtrace is:

array(6) {
 [0]=>
 array(4) {
   ["file"]=>
   string(27) "/usr/local/etc/inc/util.inc"
   ["line"]=>
   int(67)
   ["function"]=>
   string(6) "usleep"
   ["args"]=>
   array(1) {
     [0]=>
     int(200000)
   }
 }
 [1]=>
 array(4) {
   ["file"]=>
   string(44) "/usr/local/etc/inc/plugins.inc.d/unbound.inc"
   ["line"]=>
   int(364)
   ["function"]=>
   string(9) "killbypid"
   ["args"]=>
   array(3) {
     [0]=>
     string(20) "/var/run/unbound.pid"
     [1]=>
     string(4) "TERM"
     [2]=>
     bool(true)
   }
 }
 [2]=>
 array(4) {
   ["file"]=>
   string(30) "/usr/local/etc/inc/plugins.inc"
   ["line"]=>
   int(288)
   ["function"]=>
   string(20) "unbound_configure_do"
   ["args"]=>
   array(1) {
     [0]=>
     bool(false)
   }
 }
 [3]=>
 array(4) {
   ["file"]=>
   string(33) "/usr/local/etc/inc/interfaces.inc"
   ["line"]=>
   int(2421)
   ["function"]=>
   string(17) "plugins_configure"
   ["args"]=>
   array(2) {
     [0]=>
     string(3) "dns"
     [1]=>
     bool(false)
   }
 }
 [4]=>
 array(4) {
   ["file"]=>
   string(24) "/usr/local/etc/rc.linkup"
   ["line"]=>
   int(100)
   ["function"]=>
   string(19) "interface_configure"
   ["args"]=>
   array(4) {
     [0]=>
     bool(false)
     [1]=>
     string(3) "wan"
     [2]=>
     bool(true)
     [3]=>
     bool(true)
   }
 }
 [5]=>
 array(4) {
   ["file"]=>
   string(24) "/usr/local/etc/rc.linkup"
   ["line"]=>
   int(110)
   ["function"]=>
   string(21) "handle_argument_group"
   ["args"]=>
   array(2) {
     [0]=>
     string(5) "start"
     [1]=>
     string(4) "igc0"
   }
 }
}
fichtner commented 1 year ago

aha, unbound-control being the bad actor again here

Services: Unbound DNS: General enable Flush DNS cache during reload

but you will have to reboot as I'm not sure unbound-control can be tamed fully. It has already clobbered the cache file.

Keno commented 1 year ago

So is unbound just ignoring SIGTERM then? I do see the process happily running:

ps aux | grep 5558
unbound  5558   0.0  0.7   90188   53724  -  Is   Fri01       3:31.88 /usr/local/sbin/unbound -c /var/unbound/unbound.c
root    32056   0.0  0.0   12748    2536  5  S+   20:54       0:00.00 grep 5558

perhaps it should be SIGKILL'ed after a timeout?

Keno commented 1 year ago

Looks like the unbound process is indeed just ignoring SIGTERM. I don't know if this is transient, but at the very least the single attempted pkill in opnsense will not be reliable:

$ procstat -i 5558 | grep TERM
 5558 unbound          TERM     -I-
fichtner commented 1 year ago

IMO it's a bug in unbound(-control). The indefinite waiting for SIGTERM to go through we do for all services nowadays to ensure that services exited correctly.. in order to be able to surface these types of bugs in services:

% git grep killbypid | grep -v true | grep -v HUP | grep -v :function 
src/etc/inc/interfaces.inc:            killbypid("/var/run/choparp_{$vipif}.pid");
src/etc/inc/interfaces.inc:            killbypid($pid_filename);
src/etc/inc/interfaces.inc:    killbypid("/var/run/hostapd_{$interface}.pid");

Those remaining three probably should be switched to wait for the service to exit as well. but that's besides the point I guess.

SIGKILL may be too destructive, causing other side effects so what remains as an option?

Keno commented 1 year ago

IMO it's a bug in unbound(-control)

Ok, that's reasonable. I don't really know under what circumstances unbound turns off SIGTERM handling. I do see the codepath in their codebase that's supposed to handle it. I'll take a closer look once I can reboot the firewall without kicking all the users off the network :)

Keno commented 1 year ago

Ok, so I rebooted the firewall (and went back into the UI to disable the interface, which succeeded this time), but the ipv6 routes are still there despite the interface being disabled:

Internet6:
Destination        Gateway            Flags     Netif Expire
default            fe80::2cb:7aff:fe5 UG         igc0
::1                link#6             UHS         lo0
2601:189:8080:6dd: link#1             U          igc0
2601:189:8080:6dd: link#1             UHS         lo0
fe80::%igc0/64     link#1             U          igc0
fe80::6662:66ff:fe link#1             UHS         lo0
fe80::%lo0/64      link#6             U           lo0
fe80::1%lo0        link#6             UHS         lo0

I do also see a php process that may be related:

root    70091   0.0  0.4   52988  29312  -  S    00:11    0:00.92 /usr/local/bin/php /usr/local/etc/rc.newwanipv6 igc0
fichtner commented 1 year ago

How is IPv6 configured? Is it set to no-release perhaps? Ifconfig shows no IPv6 for igc0?

FWIW the unbound issue has a patch on master I forgot that serialises start/stop of unbound through the same lock. It’ll possibly only pinpoint the issue better but we shall see.

OPNsense-bot commented 1 year ago

This issue has been automatically timed-out (after 180 days of inactivity).

For more information about the policies for this repository, please read https://github.com/opnsense/core/blob/master/CONTRIBUTING.md for further details.

If someone wants to step up and work on this issue, just let us know, so we can reopen the issue and assign an owner to it.