openwrt / packages

Community maintained packages for OpenWrt. Documentation for submitting pull requests is in CONTRIBUTING.md
GNU General Public License v2.0
3.88k stars 3.4k forks source link

sqm-scripts: Stopping/Starting 100+ times a day #22996

Open ervee opened 6 months ago

ervee commented 6 months ago

Maintainer: @tohojo Environment: aarch64_generic, FriendlyElec NanoPi R2S (rockchip/armv8), OpenWrt 23.05.2

Description: SQM is stopping/starting 100+ times a day with these messages in the log:

Wed Dec 27 19:41:10 2023 user.notice SQM: Stopping SQM on eth0.34
Wed Dec 27 19:41:10 2023 user.notice SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0.34 ingress
Wed Dec 27 19:41:10 2023 user.notice SQM: ERROR: cmd_wrapper: tc: LAST ERROR: Error: Invalid handle.
Wed Dec 27 19:41:10 2023 user.notice SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0.34 root
Wed Dec 27 19:41:10 2023 user.notice SQM: ERROR: cmd_wrapper: tc: LAST ERROR: Error: Cannot delete qdisc with handle of zero.
Wed Dec 27 19:41:10 2023 user.notice SQM: Starting SQM script: piece_of_cake.qos on eth0.34, in: 85000 Kbps, out: 85000 Kbps

I can't find anything useful around these entries in the log. My WAN DHCP renews every 30 minutes but does not match up with these SQM restarts.

My question/report is actually 2-fold:

  1. Why does this happen about every 16-17 minutes, but also sometimes after only 2 minutes or 5...
  2. What about those ERROR messages?

This is my config:

root@nanort:~# cat /etc/config/sqm
config queue 'eth0'
        option enabled '1'
        option interface 'eth0.34'
        option qdisc 'cake'
        option script 'piece_of_cake.qos'
        option debug_logging '0'
        option verbosity '5'
#       option download '68408' # 85% of 80.48 Mbps
#       option upload '73840'   # 85% of 86.87 Mbps
#       option download '72432' # 90% of 80.48 Mbps
#       option upload '78183'   # 90% of 86.87 Mbps
#       option download '76456' # 95% of 80.48 Mbps
#       option upload '82527'   # 95% of 86.87 Mbps
        option download '85000'
        option upload '85000'
        option linklayer 'ethernet'
        option overhead '44'

Please let me know what else I can supply or troubleshoot.

brada4 commented 6 months ago

@tohojo may know better. What happens stop removes qdisc then start removes(nothing and errs) and adds it back. In principle should suffice for ip(route2) for last 10 years to just replace root qdisc destroying past hierarchy. You can try replac(ng remove/add to replace in qos script in use.

brada4 commented 6 months ago

why it restarts - check ethtool -S - usually cable or sub-par phy config, normally hotplug scripts get triggered on actual cable unplug or real ip c+ange from dhcp.

ervee commented 6 months ago

lac(ng remove/add to replace in qos script

@tohojo may know better. What happens stop removes qdisc then start removes(nothing and errs) and adds it back. In principle should suffice for ip(route2) for last 10 years to just replace root qdisc destroying past hierarchy. You can try replac(ng remove/add to replace in qos script in use.

There are no remove statements in piece_of_cake.qos but there are 4 add statements in there. Can I just replace them all by replace? or specific ones? Is there a specific command output I could check before and after this change to make sure I don't break functionality?

ervee commented 6 months ago

why it restarts - check ethtool -S - usually cable or sub-par phy config, normally hotplug scripts get triggered on actual cable unplug or real ip c+ange from dhcp.

The DHCP renews do not line up with the SQM restarts. The DHCP renew also does not change the actual IP. I don't see any disconnects/connects in the logs.

What am I looking for with ethtool -S? eth0 is the physical WAN port of the NanoPi R2S.

root@router:~# ethtool -S eth0
NIC statistics:
     mmc_tx_octetcount_gb: 4225059748
     mmc_tx_framecount_gb: 109989600
     mmc_tx_broadcastframe_g: 0
     mmc_tx_multicastframe_g: 0
     mmc_tx_64_octets_gb: 0
     mmc_tx_65_to_127_octets_gb: 0
     mmc_tx_128_to_255_octets_gb: 0
     mmc_tx_256_to_511_octets_gb: 0
     mmc_tx_512_to_1023_octets_gb: 0
     mmc_tx_1024_to_max_octets_gb: 0
     mmc_tx_unicast_gb: 0
     mmc_tx_multicast_gb: 0
     mmc_tx_broadcast_gb: 0
     mmc_tx_underflow_error: 0
     mmc_tx_singlecol_g: 0
     mmc_tx_multicol_g: 0
     mmc_tx_deferred: 0
     mmc_tx_latecol: 0
     mmc_tx_exesscol: 0
     mmc_tx_carrier_error: 0
     mmc_tx_octetcount_g: 4225059748
     mmc_tx_framecount_g: 109989600
     mmc_tx_excessdef: 0
     mmc_tx_pause_frame: 0
     mmc_tx_vlan_frame_g: 0
     mmc_rx_framecount_gb: 246930023
     mmc_rx_octetcount_gb: 3652580616
     mmc_rx_octetcount_g: 3652580616
     mmc_rx_broadcastframe_g: 0
     mmc_rx_multicastframe_g: 4246
     mmc_rx_crc_error: 0
     mmc_rx_align_error: 0
     mmc_rx_run_error: 0
     mmc_rx_jabber_error: 0
     mmc_rx_undersize_g: 0
     mmc_rx_oversize_g: 0
     mmc_rx_64_octets_gb: 0
     mmc_rx_65_to_127_octets_gb: 0
     mmc_rx_128_to_255_octets_gb: 0
     mmc_rx_256_to_511_octets_gb: 0
     mmc_rx_512_to_1023_octets_gb: 0
     mmc_rx_1024_to_max_octets_gb: 0
     mmc_rx_unicast_g: 0
     mmc_rx_length_error: 0
     mmc_rx_autofrangetype: 0
     mmc_rx_pause_frames: 0
     mmc_rx_fifo_overflow: 0
     mmc_rx_vlan_frames_gb: 0
     mmc_rx_watchdog_error: 0
     mmc_rx_ipc_intr_mask: 3221078013
     mmc_rx_ipc_intr: 0
     mmc_rx_ipv4_gd: 190163515
     mmc_rx_ipv4_hderr: 0
     mmc_rx_ipv4_nopay: 0
     mmc_rx_ipv4_frag: 0
     mmc_rx_ipv4_udsbl: 0
     mmc_rx_ipv4_gd_octets: 0
     mmc_rx_ipv4_hderr_octets: 0
     mmc_rx_ipv4_nopay_octets: 0
     mmc_rx_ipv4_frag_octets: 0
     mmc_rx_ipv4_udsbl_octets: 0
     mmc_rx_ipv6_gd_octets: 0
     mmc_rx_ipv6_hderr_octets: 0
     mmc_rx_ipv6_nopay_octets: 0
     mmc_rx_ipv6_gd: 56608481
     mmc_rx_ipv6_hderr: 0
     mmc_rx_ipv6_nopay: 0
     mmc_rx_udp_gd: 0
     mmc_rx_udp_err: 70
     mmc_rx_tcp_gd: 0
     mmc_rx_tcp_err: 502
     mmc_rx_icmp_gd: 0
     mmc_rx_icmp_err: 0
     mmc_rx_udp_gd_octets: 0
     mmc_rx_udp_err_octets: 3870
     mmc_rx_tcp_gd_octets: 0
     mmc_rx_tcp_err_octets: 22871
     mmc_rx_icmp_gd_octets: 0
     mmc_rx_icmp_err_octets: 0
     mmc_tx_fpe_fragment_cntr: 0
     mmc_tx_hold_req_cntr: 0
     mmc_rx_packet_assembly_err_cntr: 0
     mmc_rx_packet_smd_err_cntr: 0
     mmc_rx_packet_assembly_ok_cntr: 0
     mmc_rx_fpe_fragment_cntr: 0
     tx_underflow: 0
     tx_carrier: 0
     tx_losscarrier: 0
     vlan_tag: 246929451
     tx_deferred: 0
     tx_vlan: 109989600
     tx_jabber: 0
     tx_frame_flushed: 0
     tx_payload_error: 0
     tx_ip_header_error: 0
     rx_desc: 0
     sa_filter_fail: 0
     overflow_error: 0
     ipc_csum_error: 0
     rx_collision: 0
     rx_crc_errors: 0
     dribbling_bit: 0
     rx_length: 0
     rx_mii: 0
     rx_multicast: 0
     rx_gmac_overflow: 0
     rx_watchdog: 0
     da_rx_filter_fail: 0
     sa_rx_filter_fail: 0
     rx_missed_cntr: 0
     rx_overflow_cntr: 0
     rx_vlan: 0
     rx_split_hdr_pkt_n: 0
     tx_undeflow_irq: 0
     tx_process_stopped_irq: 0
     tx_jabber_irq: 0
     rx_overflow_irq: 0
     rx_buf_unav_irq: 0
     rx_process_stopped_irq: 0
     rx_watchdog_irq: 0
     tx_early_irq: 0
     fatal_bus_error_irq: 0
     rx_early_irq: 1499790
     threshold: 1
     tx_pkt_n: 109989600
     rx_pkt_n: 246929451
     normal_irq_n: 97968230
     rx_normal_irq_n: 93993008
     napi_poll: 118616109
     tx_normal_irq_n: 4381376
     tx_clean: 24635354
     tx_set_ic_bit: 4399584
     irq_receive_pmt_irq_n: 0
     mmc_tx_irq_n: 0
     mmc_rx_irq_n: 0
     mmc_rx_csum_offload_irq_n: 0
     irq_tx_path_in_lpi_mode_n: 0
     irq_tx_path_exit_lpi_mode_n: 0
     irq_rx_path_in_lpi_mode_n: 0
     irq_rx_path_exit_lpi_mode_n: 0
     phy_eee_wakeup_error_n: 0
     ip_hdr_err: 0
     ip_payload_err: 0
     ip_csum_bypassed: 0
     ipv4_pkt_rcvd: 0
     ipv6_pkt_rcvd: 0
     no_ptp_rx_msg_type_ext: 0
     ptp_rx_msg_type_sync: 0
     ptp_rx_msg_type_follow_up: 0
     ptp_rx_msg_type_delay_req: 0
     ptp_rx_msg_type_delay_resp: 0
     ptp_rx_msg_type_pdelay_req: 0
     ptp_rx_msg_type_pdelay_resp: 0
     ptp_rx_msg_type_pdelay_follow_up: 0
     ptp_rx_msg_type_announce: 0
     ptp_rx_msg_type_management: 0
     ptp_rx_msg_pkt_reserved_type: 0
     ptp_frame_type: 0
     ptp_ver: 0
     timestamp_dropped: 0
     av_pkt_rcvd: 0
     av_tagged_pkt_rcvd: 0
     vlan_tag_priority_val: 0
     l3_filter_match: 0
     l4_filter_match: 0
     l3_l4_filter_no_match: 0
     irq_pcs_ane_n: 0
     irq_pcs_link_n: 0
     irq_rgmii_n: 0
     mtl_tx_status_fifo_full: 0
     mtl_tx_fifo_not_empty: 0
     mmtl_fifo_ctrl: 0
     mtl_tx_fifo_read_ctrl_write: 0
     mtl_tx_fifo_read_ctrl_wait: 0
     mtl_tx_fifo_read_ctrl_read: 0
     mtl_tx_fifo_read_ctrl_idle: 0
     mac_tx_in_pause: 0
     mac_tx_frame_ctrl_xfer: 0
     mac_tx_frame_ctrl_idle: 0
     mac_tx_frame_ctrl_wait: 0
     mac_tx_frame_ctrl_pause: 0
     mac_gmii_tx_proto_engine: 0
     mtl_rx_fifo_fill_level_full: 0
     mtl_rx_fifo_fill_above_thresh: 0
     mtl_rx_fifo_fill_below_thresh: 0
     mtl_rx_fifo_fill_level_empty: 0
     mtl_rx_fifo_read_ctrl_flush: 0
     mtl_rx_fifo_read_ctrl_read_data: 0
     mtl_rx_fifo_read_ctrl_status: 0
     mtl_rx_fifo_read_ctrl_idle: 0
     mtl_rx_fifo_ctrl_active: 0
     mac_rx_frame_ctrl_fifo: 0
     mac_gmii_rx_proto_engine: 0
     tx_tso_frames: 0
     tx_tso_nfrags: 0
     mtl_est_cgce: 0
     mtl_est_hlbs: 0
     mtl_est_hlbf: 0
     mtl_est_btre: 0
     mtl_est_btrlm: 0
     q0_tx_pkt_n: 109989600
     q0_tx_irq_n: 0
     q0_rx_pkt_n: 246929451
     q0_rx_irq_n: 0
brada4 commented 6 months ago

some err counters counted something , but not up/down. yes, you can use r#placefor add as it replaces noqueue after deletion too.

tohojo commented 6 months ago

Using replace instead of add shouldn't hurt, but I don't expect it'll help either.

My guess would be that there is some system event that causes the network device to come and go periodically, causing an SQM restart due to hotplug events firing. It looks like the device has already disappeared by the time the sqm stop hook is running, which is what causes the errors.

Overall, this is most likely relatively harmless, so if you're not seeing any impact on network traffic it should be fine to ignore the errors. I don't have a good guess as to why this is happening in the first place, the only suggestion I have is to look for other log messages around the SQM errors, to see if there is anything providing any hints...

ervee commented 5 months ago

Hi @tohojo,

TL;RD: There is an IPv6 event restarting SQM. But why... Check the last part of this post if you are in a hurry.

Looking in dmesg the last ethx event was 25 days ago the script reloads at least every 5 minutes. I really have no clue what could be causing this.

I changed the 4 "add" statements in /usr/lib/sqm/piece_of_cake.qos to "replace":

# diff /usr/lib/sqm/piece_of_cake.qos /usr/lib/sqm/piece_of_cake.qos.org
24c24
<     $TC qdisc replace dev $IFACE root $( get_stab_string ) cake \
---
>     $TC qdisc add dev $IFACE root $( get_stab_string ) cake \
33c33
<     $TC qdisc replace dev $IFACE handle ffff: ingress
---
>     $TC qdisc add dev $IFACE handle ffff: ingress
38c38
<     $TC qdisc replace dev $DEV root $( get_stab_string ) cake \
---
>     $TC qdisc add dev $DEV root $( get_stab_string ) cake \
45c45
<     $TC filter replace dev $IFACE parent ffff: protocol all prio 10 u32 \
---
>     $TC filter add dev $IFACE parent ffff: protocol all prio 10 u32 \

After starting sqm this is the logread output:

Thu Jan 11 19:31:44 2024 user.notice SQM: Starting SQM script: piece_of_cake.qos on eth0.34, in: 85000 Kbps, out: 85000 Kbps
Thu Jan 11 19:31:45 2024 user.notice SQM: piece_of_cake.qos was started on eth0.34 successfully

And tc qdisc output:

# tc qdisc
qdisc noqueue 0: dev lo root refcnt 2
qdisc mq 0: dev eth0 root
qdisc fq_codel 0: dev eth0 parent :1 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 4Mb ecn drop_batch 64
qdisc fq_codel 0: dev eth1 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 4Mb ecn drop_batch 64
qdisc cake a25d: dev eth0.34 root refcnt 2 bandwidth 85Mbit besteffort triple-isolate nonat nowash no-ack-filter split-gso rtt 100ms noatm overhead 44
qdisc ingress ffff: dev eth0.34 parent ffff:fff1 ----------------
qdisc noqueue 0: dev eth1.2 root refcnt 2
qdisc noqueue 0: dev eth1.3 root refcnt 2
qdisc noqueue 0: dev eth1.5 root refcnt 2
qdisc noqueue 0: dev wg0 root refcnt 2
qdisc cake a25e: dev ifb4eth0.34 root refcnt 2 bandwidth 85Mbit besteffort triple-isolate nonat wash no-ack-filter split-gso rtt 100ms noatm overhead 44

After a while I get a stop/start event:

Thu Jan 11 19:47:52 2024 user.notice SQM: Stopping SQM on eth0.34
Thu Jan 11 19:47:52 2024 user.notice SQM: Starting SQM script: piece_of_cake.qos on eth0.34, in: 85000 Kbps, out: 85000 Kbps
Thu Jan 11 19:47:53 2024 user.notice SQM: piece_of_cake.qos was started on eth0.34 successfully

At 19:35 and 20:05 there are WAN DHCP renews, but again, this is a renew of the IP my router already had and it is 12 minutes before and 18 minutes after the SQM restart.

The tc qdisc output changes after the stop/start:

# tc qdisc
qdisc noqueue 0: dev lo root refcnt 2
qdisc mq 0: dev eth0 root
qdisc fq_codel 0: dev eth0 parent :1 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 4Mb ecn drop_batch 64
qdisc fq_codel 0: dev eth1 root refcnt 2 limit 10240p flows 1024 quantum 1514 target 5ms interval 100ms memory_limit 4Mb ecn drop_batch 64
qdisc noqueue 0: dev eth0.34 root refcnt 2
qdisc noqueue 0: dev eth1.2 root refcnt 2
qdisc noqueue 0: dev eth1.3 root refcnt 2
qdisc noqueue 0: dev eth1.5 root refcnt 2
qdisc noqueue 0: dev wg0 root refcnt 2
qdisc cake a262: dev ifb4eth0.34 root refcnt 2 bandwidth 85Mbit besteffort triple-isolate nonat wash no-ack-filter split-gso rtt 100ms noatm overhead 44

The qdisc cake a25d: dev eth0.34... is missing and qdisc ingress ffff: dev eth0.34... is missing. And in the last line a25e changed to a262 but that is probably just some increasing identifier.

I did some searching on troubleshooting hotplug and found that you can create /etc/hotplug.d/iface/00-logger and /etc/hotplug.d/net/00-logger with logger -t hotplug $(env) in it and then it will log stuff! :)

After a wait I saw this:

Thu Jan 11 20:04:27 2024 user.notice hotplug: USER=root ACTION=ifupdate SHLVL=1 HOME=/ HOTPLUG_TYPE=iface IFUPDATE_PREFIXES=1 LOGNAME=root DEVICENAME= TERM=linux PATH=/usr/sbin:/usr/bin:/sbin:/bin INTERFACE=wan6 PWD=/ DEVICE=eth0.34
Thu Jan 11 20:04:32 2024 user.notice SQM: Stopping SQM on eth0.34
Thu Jan 11 20:04:32 2024 user.notice SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0.34 ingress
Thu Jan 11 20:04:32 2024 user.notice SQM: ERROR: cmd_wrapper: tc: LAST ERROR: Error: Invalid handle.
Thu Jan 11 20:04:32 2024 user.notice SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0.34 root
Thu Jan 11 20:04:32 2024 user.notice SQM: ERROR: cmd_wrapper: tc: LAST ERROR: Error: Cannot delete qdisc with handle of zero.
Thu Jan 11 20:04:32 2024 user.notice hotplug: USER=root ACTION=remove SHLVL=1 HOME=/ SEQNUM=54056 IFINDEX=8812 HOTPLUG_TYPE=net DEVPATH=/devices/virtual/net/ifb4eth0.34 LOGNAME=root DEVICENAME=ifb4eth0.34 TERM=linux SUBSYSTEM=net PATH=/usr/sbin:/usr/bin:/sbin:/bin INTERFACE=ifb4eth0.34 PWD=/
Thu Jan 11 20:04:32 2024 user.notice SQM: Starting SQM script: piece_of_cake.qos on eth0.34, in: 85000 Kbps, out: 85000 Kbps
Thu Jan 11 20:04:32 2024 user.notice hotplug: USER=root ACTION=add SHLVL=1 HOME=/ SEQNUM=54057 IFINDEX=8814 HOTPLUG_TYPE=net DEVPATH=/devices/virtual/net/SQM_IFB_a6e82 LOGNAME=root DEVICENAME=SQM_IFB_a6e82 TERM=linux SUBSYSTEM=net PATH=/usr/sbin:/usr/bin:/sbin:/bin INTERFACE=SQM_IFB_a6e82 PWD=/
Thu Jan 11 20:04:32 2024 user.notice hotplug: USER=root ACTION=remove SHLVL=1 HOME=/ SEQNUM=54062 IFINDEX=8814 HOTPLUG_TYPE=net DEVPATH=/devices/virtual/net/SQM_IFB_a6e82 LOGNAME=root DEVICENAME=SQM_IFB_a6e82 TERM=linux SUBSYSTEM=net PATH=/usr/sbin:/usr/bin:/sbin:/bin INTERFACE=SQM_IFB_a6e82 PWD=/
Thu Jan 11 20:04:33 2024 user.notice hotplug: USER=root ACTION=add SHLVL=1 HOME=/ SEQNUM=54063 IFINDEX=8815 HOTPLUG_TYPE=net DEVPATH=/devices/virtual/net/SQM_IFB_888bb LOGNAME=root DEVICENAME=SQM_IFB_888bb TERM=linux SUBSYSTEM=net PATH=/usr/sbin:/usr/bin:/sbin:/bin INTERFACE=SQM_IFB_888bb PWD=/
Thu Jan 11 20:04:33 2024 user.notice hotplug: USER=root ACTION=remove SHLVL=1 HOME=/ SEQNUM=54068 IFINDEX=8815 HOTPLUG_TYPE=net DEVPATH=/devices/virtual/net/SQM_IFB_888bb LOGNAME=root DEVICENAME=SQM_IFB_888bb TERM=linux SUBSYSTEM=net PATH=/usr/sbin:/usr/bin:/sbin:/bin INTERFACE=SQM_IFB_888bb PWD=/
Thu Jan 11 20:04:33 2024 user.notice hotplug: USER=root ACTION=add SHLVL=1 HOME=/ SEQNUM=54069 IFINDEX=8816 HOTPLUG_TYPE=net DEVPATH=/devices/virtual/net/ifb4eth0.34 LOGNAME=root DEVICENAME=ifb4eth0.34 TERM=linux SUBSYSTEM=net PATH=/usr/sbin:/usr/bin:/sbin:/bin INTERFACE=ifb4eth0.34 PWD=/
Thu Jan 11 20:04:33 2024 user.notice hotplug: USER=root ACTION=add SHLVL=1 HOME=/ SEQNUM=54072 IFINDEX=8817 HOTPLUG_TYPE=net DEVPATH=/devices/virtual/net/SQM_IFB_2c414 LOGNAME=root DEVICENAME=SQM_IFB_2c414 TERM=linux SUBSYSTEM=net PATH=/usr/sbin:/usr/bin:/sbin:/bin INTERFACE=SQM_IFB_2c414 PWD=/
Thu Jan 11 20:04:33 2024 user.notice hotplug: USER=root ACTION=remove SHLVL=1 HOME=/ SEQNUM=54077 IFINDEX=8817 HOTPLUG_TYPE=net DEVPATH=/devices/virtual/net/SQM_IFB_2c414 LOGNAME=root DEVICENAME=SQM_IFB_2c414 TERM=linux SUBSYSTEM=net PATH=/usr/sbin:/usr/bin:/sbin:/bin INTERFACE=SQM_IFB_2c414 PWD=/
Thu Jan 11 20:04:33 2024 user.notice SQM: piece_of_cake.qos was started on eth0.34 successfully

So there is an ifupdate event on eth0.34 caused by wan6 perhaps? As far as I can tell my IPv6 is rock solid and the prefix never changes (if my IPv4 does not change). And the ERROR messages when SQM restarts are still there after changing to "replace".

Do you have IPv6 and not have this event once in a while?

tohojo commented 5 months ago

Ralf @.***> writes:

I did some searching on troubleshooting hotplug and found that you can create /etc/hotplug.d/iface/00-logger and /etc/hotplug.d/net/00-logger with logger -t hotplug $(env) in it and then it will log stuff! :)

After a wait I saw this:


Thu Jan 11 20:04:27 2024 user.notice hotplug: USER=root ACTION=ifupdate SHLVL=1 HOME=/ HOTPLUG_TYPE=iface IFUPDATE_PREFIXES=1 LOGNAME=root DEVICENAME= TERM=linux PATH=/usr/sbin:/usr/bin:/sbin:/bin INTERFACE=wan6 PWD=/ DEVICE=eth0.34
Thu Jan 11 20:04:32 2024 user.notice SQM: Stopping SQM on eth0.34

Hmm, there's a five second gap between that hotplug event and the sqm restart, so that doesn't really seem to be the cause. You could try adding a 'logger' line to the sqm hotplug script to see if the restart is coming from there?

Otherwise I suppose there must be some other thing on the system calling SQM restart? A cron job, or some other thing?

brada4 commented 5 months ago

Thats ifb device (ex dummy) being recreated (rolling id is interface id, like you see in ip link the frontmost number like 1 for loopback) In principle you can drop events with ^ifb.* as it is qos piece, not genuine interface you police traffic on. Still what makes it recreating is a question.

tohojo commented 5 months ago

Andrew @.***> writes:

Thats ifb device (ex dummy) being recreated (rolling id is interface id, like you see in ip link the frontmost number like 1 for loopback) In principle you can drop events with ^ifb.* as it is qos piece, not genuine interface you police traffic on. Still what makes it recreating is a question.

The ifb devices are created by sqm-scripts for feature probing, so that part is normal. It's the reason that an sqm restart is triggered in the first place that is elusive here...

ervee commented 5 months ago

I kept the /etc/hotplug.d/iface/00-logger, removed /etc/hotplug.d/net/00-logger and added a logger -t hotplug $(env) line to /etc/hotplug.d/iface/11-sqm (just after the first line).

This is the output of logread now:

Fri Jan 12 20:34:23 2024 user.notice hotplug: USER=root ACTION=ifupdate SHLVL=1 HOME=/ HOTPLUG_TYPE=iface IFUPDATE_PREFIXES=1 LOGNAME=root DEVICENAME= TERM=linux PATH=/usr/sbin:/usr/bin:/sbin:/bin INTERFACE=wan6 PWD=/ DEVICE=eth0.34
Fri Jan 12 20:34:23 2024 user.notice hotplug: USER=root ACTION=ifupdate SHLVL=1 HOME=/ HOTPLUG_TYPE=iface IFUPDATE_PREFIXES=1 LOGNAME=root DEVICENAME= TERM=linux PATH=/usr/sbin:/usr/bin:/sbin:/bin INTERFACE=wan6 PWD=/ DEVICE=eth0.34
Fri Jan 12 20:34:27 2024 user.notice SQM: Stopping SQM on eth0.34
Fri Jan 12 20:34:27 2024 user.notice SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0.34 ingress
Fri Jan 12 20:34:27 2024 user.notice SQM: ERROR: cmd_wrapper: tc: LAST ERROR: Error: Invalid handle.
Fri Jan 12 20:34:27 2024 user.notice SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0.34 root
Fri Jan 12 20:34:27 2024 user.notice SQM: ERROR: cmd_wrapper: tc: LAST ERROR: Error: Cannot delete qdisc with handle of zero.
Fri Jan 12 20:34:28 2024 user.notice SQM: Starting SQM script: piece_of_cake.qos on eth0.34, in: 85000 Kbps, out: 85000 Kbps
Fri Jan 12 20:34:28 2024 user.notice SQM: piece_of_cake.qos was started on eth0.34 successfully
Fri Jan 12 20:35:32 2024 daemon.notice netifd: wan (3449): udhcpc: sending renew to server 217.19.17.85
Fri Jan 12 20:35:32 2024 daemon.notice netifd: wan (3449): udhcpc: lease of 82.197.218.30 obtained from 217.19.17.85, lease time 3600
Fri Jan 12 20:51:03 2024 user.notice hotplug: USER=root ACTION=ifupdate SHLVL=1 HOME=/ HOTPLUG_TYPE=iface IFUPDATE_PREFIXES=1 LOGNAME=root DEVICENAME= TERM=linux PATH=/usr/sbin:/usr/bin:/sbin:/bin INTERFACE=wan6 PWD=/ DEVICE=eth0.34
Fri Jan 12 20:51:03 2024 user.notice hotplug: USER=root ACTION=ifupdate SHLVL=1 HOME=/ HOTPLUG_TYPE=iface IFUPDATE_PREFIXES=1 LOGNAME=root DEVICENAME= TERM=linux PATH=/usr/sbin:/usr/bin:/sbin:/bin INTERFACE=wan6 PWD=/ DEVICE=eth0.34
Fri Jan 12 20:51:08 2024 user.notice SQM: Stopping SQM on eth0.34
Fri Jan 12 20:51:08 2024 user.notice SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0.34 ingress
Fri Jan 12 20:51:08 2024 user.notice SQM: ERROR: cmd_wrapper: tc: LAST ERROR: Error: Invalid handle.
Fri Jan 12 20:51:08 2024 user.notice SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0.34 root
Fri Jan 12 20:51:08 2024 user.notice SQM: ERROR: cmd_wrapper: tc: LAST ERROR: Error: Cannot delete qdisc with handle of zero.
Fri Jan 12 20:51:08 2024 user.notice SQM: Starting SQM script: piece_of_cake.qos on eth0.34, in: 85000 Kbps, out: 85000 Kbps
Fri Jan 12 20:51:09 2024 user.notice SQM: piece_of_cake.qos was started on eth0.34 successfully
Fri Jan 12 20:58:00 2024 user.notice hotplug: USER=root ACTION=ifupdate SHLVL=1 HOME=/ HOTPLUG_TYPE=iface IFUPDATE_PREFIXES=1 LOGNAME=root DEVICENAME= TERM=linux PATH=/usr/sbin:/usr/bin:/sbin:/bin INTERFACE=wan6 PWD=/ DEVICE=eth0.34
Fri Jan 12 20:58:00 2024 user.notice hotplug: USER=root ACTION=ifupdate SHLVL=1 HOME=/ HOTPLUG_TYPE=iface IFUPDATE_PREFIXES=1 LOGNAME=root DEVICENAME= TERM=linux PATH=/usr/sbin:/usr/bin:/sbin:/bin INTERFACE=wan6 PWD=/ DEVICE=eth0.34
Fri Jan 12 20:58:04 2024 user.notice SQM: Stopping SQM on eth0.34
Fri Jan 12 20:58:04 2024 user.notice SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0.34 ingress
Fri Jan 12 20:58:04 2024 user.notice SQM: ERROR: cmd_wrapper: tc: LAST ERROR: Error: Invalid handle.
Fri Jan 12 20:58:04 2024 user.notice SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0.34 root
Fri Jan 12 20:58:04 2024 user.notice SQM: ERROR: cmd_wrapper: tc: LAST ERROR: Error: Cannot delete qdisc with handle of zero.
Fri Jan 12 20:58:05 2024 user.notice SQM: Starting SQM script: piece_of_cake.qos on eth0.34, in: 85000 Kbps, out: 85000 Kbps
Fri Jan 12 20:58:05 2024 user.notice SQM: piece_of_cake.qos was started on eth0.34 successfully
Fri Jan 12 21:05:32 2024 daemon.notice netifd: wan (3449): udhcpc: sending renew to server 217.19.17.85
Fri Jan 12 21:05:32 2024 daemon.notice netifd: wan (3449): udhcpc: lease of 82.197.218.30 obtained from 217.19.17.85, lease time 3600
Fri Jan 12 21:07:43 2024 user.notice hotplug: USER=root ACTION=ifupdate SHLVL=1 HOME=/ HOTPLUG_TYPE=iface IFUPDATE_PREFIXES=1 LOGNAME=root DEVICENAME= TERM=linux PATH=/usr/sbin:/usr/bin:/sbin:/bin INTERFACE=wan6 PWD=/ DEVICE=eth0.34
Fri Jan 12 21:07:43 2024 user.notice hotplug: USER=root ACTION=ifupdate SHLVL=1 HOME=/ HOTPLUG_TYPE=iface IFUPDATE_PREFIXES=1 LOGNAME=root DEVICENAME= TERM=linux PATH=/usr/sbin:/usr/bin:/sbin:/bin INTERFACE=wan6 PWD=/ DEVICE=eth0.34
Fri Jan 12 21:07:48 2024 user.notice SQM: Stopping SQM on eth0.34
Fri Jan 12 21:07:48 2024 user.notice SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0.34 ingress
Fri Jan 12 21:07:48 2024 user.notice SQM: ERROR: cmd_wrapper: tc: LAST ERROR: Error: Invalid handle.
Fri Jan 12 21:07:48 2024 user.notice SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0.34 root
Fri Jan 12 21:07:48 2024 user.notice SQM: ERROR: cmd_wrapper: tc: LAST ERROR: Error: Cannot delete qdisc with handle of zero.
Fri Jan 12 21:07:48 2024 user.notice SQM: Starting SQM script: piece_of_cake.qos on eth0.34, in: 85000 Kbps, out: 85000 Kbps
Fri Jan 12 21:07:49 2024 user.notice SQM: piece_of_cake.qos was started on eth0.34 successfully

So the SQM restart does seem to be triggered by that hotplug event. The first user.notice hotplug is from /etc/hotplug.d/iface/00-logger and the second is from the /etc/hotplug.d/iface/11-sqm script. Why it takes 4 to 5 seconds before there is an actual "Stopping SQM" event I don't know.

My IPv6 prefix does not change and things like the firewall do not restart. So can this event be filtered out somehow?

brada4 commented 5 months ago

It is wan6 rtadv which is usually notlogged.

ervee commented 5 months ago

Amy idea how to exclude this event from the SQM restart?

brada4 commented 5 months ago

Filter in front of script or disable wan6 interface.

tohojo commented 5 months ago

Ralf @.***> writes:

I kept the hotplug iface logging and added a logger -t hotplug $(env) line to /etc/hotplug.d/iface/11-sqm (just after the first line).

The hotplug script will definitely be executed, but it has logic in it to only react to ifup and ifdown events. So I don't think this is the source of the restart (also thinking about the five-second delay).

ervee commented 5 months ago

(also thinking about the five-second delay).

I know, but it is darn consistent. I changed the logger commands in iface/00-logger and iface/11-sqm to log hotplug-iface and logger-sqm. Now you can see that the iface/00-logger fires and iface/11-sqm fires at the same time. Then it just takes 4 to 5 seconds for the rest of the iface/11-sqm script to actually stop/start SQM.

Sun Jan 14 15:08:18 2024 user.notice hotplug-iface: USER=root ACTION=ifupdate SHLVL=1 HOME=/ HOTPLUG_TYPE=iface IFUPDATE_PREFIXES=1 LOGNAME=root DEVICENAME= TERM=linux PATH=/usr/sbin:/usr/bin:/sbin:/bin INTERFACE=wan6 PWD=/ DEVICE=eth0.34
Sun Jan 14 15:08:18 2024 user.notice logger-sqm: USER=root ACTION=ifupdate SHLVL=1 HOME=/ HOTPLUG_TYPE=iface IFUPDATE_PREFIXES=1 LOGNAME=root DEVICENAME= TERM=linux PATH=/usr/sbin:/usr/bin:/sbin:/bin INTERFACE=wan6 PWD=/ DEVICE=eth0.34
Sun Jan 14 15:08:23 2024 user.notice SQM: Stopping SQM on eth0.34
Sun Jan 14 15:08:23 2024 user.notice SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0.34 ingress
Sun Jan 14 15:08:23 2024 user.notice SQM: ERROR: cmd_wrapper: tc: LAST ERROR: Error: Invalid handle.
Sun Jan 14 15:08:23 2024 user.notice SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0.34 root
Sun Jan 14 15:08:23 2024 user.notice SQM: ERROR: cmd_wrapper: tc: LAST ERROR: Error: Cannot delete qdisc with handle of zero.
Sun Jan 14 15:08:23 2024 user.notice SQM: Starting SQM script: piece_of_cake.qos on eth0.34, in: 85000 Kbps, out: 85000 Kbps
Sun Jan 14 15:08:24 2024 user.notice SQM: piece_of_cake.qos was started on eth0.34 successfully

Sun Jan 14 15:08:40 2024 user.notice hotplug-iface: USER=root ACTION=ifupdate SHLVL=1 HOME=/ HOTPLUG_TYPE=iface IFUPDATE_PREFIXES=1 LOGNAME=root DEVICENAME= TERM=linux PATH=/usr/sbin:/usr/bin:/sbin:/bin INTERFACE=wan6 PWD=/ DEVICE=eth0.34
Sun Jan 14 15:08:40 2024 user.notice logger-sqm: USER=root ACTION=ifupdate SHLVL=1 HOME=/ HOTPLUG_TYPE=iface IFUPDATE_PREFIXES=1 LOGNAME=root DEVICENAME= TERM=linux PATH=/usr/sbin:/usr/bin:/sbin:/bin INTERFACE=wan6 PWD=/ DEVICE=eth0.34
Sun Jan 14 15:08:45 2024 user.notice SQM: Stopping SQM on eth0.34
Sun Jan 14 15:08:45 2024 user.notice SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0.34 ingress
Sun Jan 14 15:08:45 2024 user.notice SQM: ERROR: cmd_wrapper: tc: LAST ERROR: Error: Invalid handle.
Sun Jan 14 15:08:45 2024 user.notice SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0.34 root
Sun Jan 14 15:08:45 2024 user.notice SQM: ERROR: cmd_wrapper: tc: LAST ERROR: Error: Cannot delete qdisc with handle of zero.
Sun Jan 14 15:08:45 2024 user.notice SQM: Starting SQM script: piece_of_cake.qos on eth0.34, in: 85000 Kbps, out: 85000 Kbps
Sun Jan 14 15:08:46 2024 user.notice SQM: piece_of_cake.qos was started on eth0.34 successfully

Sun Jan 14 15:17:57 2024 user.notice hotplug-iface: USER=root ACTION=ifupdate SHLVL=1 HOME=/ HOTPLUG_TYPE=iface IFUPDATE_PREFIXES=1 LOGNAME=root DEVICENAME= TERM=linux PATH=/usr/sbin:/usr/bin:/sbin:/bin INTERFACE=wan6 PWD=/ DEVICE=eth0.34
Sun Jan 14 15:17:57 2024 user.notice logger-sqm: USER=root ACTION=ifupdate SHLVL=1 HOME=/ HOTPLUG_TYPE=iface IFUPDATE_PREFIXES=1 LOGNAME=root DEVICENAME= TERM=linux PATH=/usr/sbin:/usr/bin:/sbin:/bin INTERFACE=wan6 PWD=/ DEVICE=eth0.34
Sun Jan 14 15:18:02 2024 user.notice SQM: Stopping SQM on eth0.34
Sun Jan 14 15:18:02 2024 user.notice SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0.34 ingress
Sun Jan 14 15:18:02 2024 user.notice SQM: ERROR: cmd_wrapper: tc: LAST ERROR: Error: Invalid handle.
Sun Jan 14 15:18:02 2024 user.notice SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0.34 root
Sun Jan 14 15:18:02 2024 user.notice SQM: ERROR: cmd_wrapper: tc: LAST ERROR: Error: Cannot delete qdisc with handle of zero.
Sun Jan 14 15:18:02 2024 user.notice SQM: Starting SQM script: piece_of_cake.qos on eth0.34, in: 85000 Kbps, out: 85000 Kbps
Sun Jan 14 15:18:03 2024 user.notice SQM: piece_of_cake.qos was started on eth0.34 successfully

Sun Jan 14 15:22:33 2024 user.notice hotplug-iface: USER=root ACTION=ifupdate SHLVL=1 HOME=/ HOTPLUG_TYPE=iface IFUPDATE_PREFIXES=1 LOGNAME=root DEVICENAME= TERM=linux PATH=/usr/sbin:/usr/bin:/sbin:/bin INTERFACE=wan6 PWD=/ DEVICE=eth0.34
Sun Jan 14 15:22:33 2024 user.notice logger-sqm: USER=root ACTION=ifupdate SHLVL=1 HOME=/ HOTPLUG_TYPE=iface IFUPDATE_PREFIXES=1 LOGNAME=root DEVICENAME= TERM=linux PATH=/usr/sbin:/usr/bin:/sbin:/bin INTERFACE=wan6 PWD=/ DEVICE=eth0.34
Sun Jan 14 15:22:38 2024 user.notice SQM: Stopping SQM on eth0.34
Sun Jan 14 15:22:38 2024 user.notice SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0.34 ingress
Sun Jan 14 15:22:38 2024 user.notice SQM: ERROR: cmd_wrapper: tc: LAST ERROR: Error: Invalid handle.
Sun Jan 14 15:22:38 2024 user.notice SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0.34 root
Sun Jan 14 15:22:38 2024 user.notice SQM: ERROR: cmd_wrapper: tc: LAST ERROR: Error: Cannot delete qdisc with handle of zero.
Sun Jan 14 15:22:38 2024 user.notice SQM: Starting SQM script: piece_of_cake.qos on eth0.34, in: 85000 Kbps, out: 85000 Kbps
Sun Jan 14 15:22:39 2024 user.notice SQM: piece_of_cake.qos was started on eth0.34 successfully

Sun Jan 14 15:25:21 2024 user.notice hotplug-iface: USER=root ACTION=ifupdate SHLVL=1 HOME=/ HOTPLUG_TYPE=iface IFUPDATE_PREFIXES=1 LOGNAME=root DEVICENAME= TERM=linux PATH=/usr/sbin:/usr/bin:/sbin:/bin INTERFACE=wan6 PWD=/ DEVICE=eth0.34
Sun Jan 14 15:25:21 2024 user.notice logger-sqm: USER=root ACTION=ifupdate SHLVL=1 HOME=/ HOTPLUG_TYPE=iface IFUPDATE_PREFIXES=1 LOGNAME=root DEVICENAME= TERM=linux PATH=/usr/sbin:/usr/bin:/sbin:/bin INTERFACE=wan6 PWD=/ DEVICE=eth0.34
Sun Jan 14 15:25:25 2024 user.notice SQM: Stopping SQM on eth0.34
Sun Jan 14 15:25:25 2024 user.notice SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0.34 ingress
Sun Jan 14 15:25:25 2024 user.notice SQM: ERROR: cmd_wrapper: tc: LAST ERROR: Error: Invalid handle.
Sun Jan 14 15:25:25 2024 user.notice SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0.34 root
Sun Jan 14 15:25:25 2024 user.notice SQM: ERROR: cmd_wrapper: tc: LAST ERROR: Error: Cannot delete qdisc with handle of zero.
Sun Jan 14 15:25:26 2024 user.notice SQM: Starting SQM script: piece_of_cake.qos on eth0.34, in: 85000 Kbps, out: 85000 Kbps
Sun Jan 14 15:25:26 2024 user.notice SQM: piece_of_cake.qos was started on eth0.34 successfully

Filter in front of script or disable wan6 interface.

I'm not going to disable wan6. I added something in 11-sqm just after restart_sqm() {

    [ "$ACTION" = "ifupdate" -a "$INTERFACE" = "wan6" ] && {
        logger -t logger-sqm ifupdate on wan6
        exit 0
    }

This is pretty ugly but let's see what it does. I could also add this to the beginning of the script. That would be better perhaps.

ervee commented 5 months ago

I added something in 11-sqm just after restart_sqm() {

    [ "$ACTION" = "ifupdate" -a "$INTERFACE" = "wan6" ] && {
        logger -t logger-sqm ifupdate on wan6
        exit 0
    }

Ignore that. It did not work :)

ervee commented 5 months ago

I have added logger configuration now to the start of the hotplug.d/iface/11-sqm script and created a new part in the script:

[ "$ACTION" = "ifupdate" -a "$INTERFACE" = "wan6" ] && {
    logger -t logger-sqm ifupdate on wan6
    exit 0
}

And added logger lines to the restart_sqm part and the "$ACTION" = ifup and the "$ACTION" = ifdown parts. Just to see what gets triggered.

The resulting logs:

Sat Jan 27 16:28:32 2024 user.notice logger-sqm: USER=root ACTION=ifupdate SHLVL=1 HOME=/ HOTPLUG_TYPE=iface IFUPDATE_PREFIXES=1 LOGNAME=root DEVICENAME= TERM=linux PATH=/usr/sbin:/usr/bin:/sbin:/bin INTERFACE=wan6 PWD=/ DEVICE=eth0.34
Sat Jan 27 16:28:32 2024 user.notice logger-sqm: ifupdate on wan6
Sat Jan 27 16:28:37 2024 user.notice SQM: Stopping SQM on eth0.34
Sat Jan 27 16:28:37 2024 user.notice SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0.34 ingress
Sat Jan 27 16:28:37 2024 user.notice SQM: ERROR: cmd_wrapper: tc: LAST ERROR: Error: Invalid handle.
Sat Jan 27 16:28:37 2024 user.notice SQM: ERROR: cmd_wrapper: tc: FAILURE (2): /sbin/tc qdisc del dev eth0.34 root
Sat Jan 27 16:28:37 2024 user.notice SQM: ERROR: cmd_wrapper: tc: LAST ERROR: Error: Cannot delete qdisc with handle of zero.
Sat Jan 27 16:28:37 2024 user.notice SQM: Starting SQM script: piece_of_cake.qos on eth0.34, in: 85000 Kbps, out: 85000 Kbps
Sat Jan 27 16:28:38 2024 user.notice SQM: piece_of_cake.qos was started on eth0.34 successfully

The first user.notice logger-sqm: is at the start of the script. So the hotplug.d/iface/11-sqm script script is triggered by something every now and then, but we already knew that.

The second user.notice logger-sqm: ifupdate on wan6 is in the part that tests for ACTION" = "ifupdate" and "$INTERFACE" = "wan6" but we also kind of knew that. I just wanted to see if I could trigger it that way. But as you can see above I have added an exit 0 after the logger line. This should exit the script is it not? And if it does, 4-5 seconds later there is still a stop/start of SQM. Without any of the other loggers logging anything. @tohojo noticed this behavior earlier and it could suggest that the stop/start is not directly triggered by the hotplug event. But it must be triggered indirectly somehow because it is without exception always triggered 5 to 6 seconds after the hotplug event.

brianjmurrell commented 2 months ago

Any further progress here? I am also noticing a lot of SQM reloads and they align with IPv6 ifupdate events. But it is also not lost on me that /etc/hotplug.d/iface/11-sqm is basically doing:

ALL_DEVICES=$(echo $DEVICE $(uci -q get network.$INTERFACE.ifname) $(ports_for_device $(uci -q get network.$INTERFACE.device)) | tr ' ' '\n' | sort -u)
…
[ "$ACTION" = ifup ] && /etc/init.d/sqm enabled && restart_sqm

if [ "$ACTION" = ifdown ]; then
    for dev in $ALL_DEVICES; do
        /usr/lib/sqm/run.sh stop $dev
    done
fi

and enabling xtrace in /etc/hotplug.d/iface/11-sqm confirms that it's not initiating the SQM reload:

+ ALL_DEVICES=pppoe-wan1
+ '[' ifupdate '=' ifup ]
+ '[' ifupdate '=' ifdown ]

When the reload does happen /bin/sh /etc/rc.common /etc/init.d/sqm reload's parent PID is 1 which is our good friend procd so somehow this reload is being triggered by procd, not hotplug/iface.

I suspect it's this:

service_triggers()
{
        procd_add_reload_trigger "sqm"
}

that's at the root of the issue.

Unfortunately /etc/init.d/sqm doesn't seem to get a lot of information about why it's service_triggers() is being called, for example, to head off these reload events for interfaces and/or events that don't seem to really justify a reload.

FWIW, the sqm-scripts isn't really doing anything special here except packaging up https://github.com/tohojo/sqm-scripts so I suspect any improvement on this issue is going to have to happen there.

ervee commented 2 months ago

Nice find Brian, thank you.