tohojo / sqm-scripts

SQM scripts traffic shaper
http://www.bufferbloat.net/projects/cerowrt/wiki/Smart_Queue_Management
234 stars 64 forks source link

SQM does not always detect cake as available qdisc at boot #122

Open hnyman opened 4 years ago

hnyman commented 4 years ago

I have lately noticed that SQM does not always properly recognize cake at OpenWrt startup. The list of available qdiscs in LuCI contains then only fq_codel, but no trace of cake. Manually running /usr/lib/sqm/update-available-qdiscs brings cake to the qdisc list.

Note the time difference below. fq_codel detected at boot, cake later on manual run:

root@router1:~# /usr/lib/sqm/update-available-qdiscs

root@router1:~# ls -l /tmp/run/sqm/available_qdiscs/
-rw-r--r--    1 root     root             0 May 26 20:38 cake
-rw-r--r--    1 root     root             0 May 26 19:10 fq_codel

I have noticed this several times after the reboot after the flashing of new OpenWrt master firmware into my ipq806x/R7800. (I am usually using simple/fq_codel, but I have lately used also cake. So it is hard to say if this is really new, or already something old.)

I suspect that there is some race condition or similar in the SQM startup process, so that it misses detecting cake.

I haven't tried to debug it further, but thought to mention it case somebody else runs into that. Could be that a small delay in the detection script would help, or something similar.

tohojo commented 4 years ago

Hannu Nyman notifications@github.com writes:

I have lately noticed that SQM does not always properly recognize cake at OpenWrt startup. The list of available qdiscs in LuCI contains then only fq_codel, but no trace of cake. Manually running /usr/lib/sqm/update-available-qdiscs brings cake to the qdisc list.

Hmm, that's odd... Do you get anything in the log? Could you try enable debugging and see if anything shows up in the logs?

Try putting this:

SQM_VERBOSITY_MAX=10 SQM_DEBUG=1

in /etc/sqm.conf and look in the syslog or in the logfiles created in /var/run/sqm

hnyman commented 4 years ago

The system log shows nothing interesting by default:

root@router1:~# logread | grep sqm
Tue May 26 19:10:40 2020 daemon.notice procd: /etc/rc.d/S50sqm: SQM: Starting SQM script: simple.qos on eth0.2, in: 190000 Kbps, out: 18000 Kbps
Tue May 26 19:10:40 2020 user.notice SQM: Using generic sqm_start_default function.
Tue May 26 19:10:40 2020 daemon.notice procd: /etc/rc.d/S50sqm: SQM: Using generic sqm_start_default function.
Tue May 26 19:10:41 2020 daemon.notice procd: /etc/rc.d/S50sqm: SQM: simple.qos was started on eth0.2 successfully
Tue May 26 19:10:45 2020 user.notice ucitrack: Setting up /etc/config/firewall reload dependency on /etc/config/sqm
Tue May 26 19:12:26 2020 user.notice SQM: Using generic sqm_start_default function.
Tue May 26 19:12:29 2020 user.notice SQM: Using generic sqm_start_default function.

I added verbosity to the config, so maybe something interesting surfaces on the next times.

tohojo commented 4 years ago

There wouldn't be, by default. I'm interested in the debug output from the qdisc checking script at boot, which is why the debug config should go into /etc/sqm.conf

hnyman commented 4 years ago

I tried normal reboot, and both qdiscs got detected ok.

I suspect that the trouble is somehow related to the first boot after flashing, as the discs & kept settings are initialised, so the process timings may be a bit different from a normal boot.

moeller0 commented 4 years ago

Silly question,

is fq_codel by chance baked into the kernel while cake is a module?

Best Regards Sebastian

On May 26, 2020, at 20:56, Hannu Nyman notifications@github.com wrote:

I tried normal reboot, and both qdiscs got detected ok.

I suspect that the trouble is somehow related to the first boot after flashing, as the discs & kept settings are initialised, so the process timings may be a bit different from a normal boot.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

tohojo commented 4 years ago

Ah, right. Are you baking sqm-scripts and kmod-sched-cake into the image?

hnyman commented 4 years ago

Are you baking sqm-scripts and kmod-sched-cake into the image?

Sure. They are included in the image. I include luci-app-sqm and it pulls the dependencies. (ipq806x has kernel 5.4, so it is the native in-tree cake module.)

I tried now re-flashing the router, but this time the detection worked ok at boot. The failure does not happen always. (But there is no trace of cake detection in SQM start log, although the marker file is now present in available_qdiscs folder.)

root@router1:/tmp/run/sqm# ls -l /tmp/run/sqm/available_qdiscs/
-rw-r--r--    1 root     root             0 May 26 21:47 cake
-rw-r--r--    1 root     root             0 May 26 21:47 fq_codel

eth0.2.start-sqm.log

eth0.2.state.txt

Hmmm.
While writing this, I noticed that SQM has actually started several times during boot, as interface ifups trigger restarts. Apparently only the log of the last start (22:03) get preserved in SQM's own log but the earlier restart (21:47) gets lost. I copied the earlier starts from the system log, so it contains also unrelated garbage: bootlog.txt

Ps. I added the verbosity to the uci config file, not /etc/sqm.conf, but that should make no difference, I think.

amteza commented 4 years ago

Ah, right. Are you baking sqm-scripts and kmod-sched-cake into the image?

Interesting enough I do not have them compiled into the image —both are modules— and I am experiencing same behaviour. Difference is that I am on 19.07.3 with a few cherry-picked patches from master.

tohojo commented 4 years ago

Hmm, since the available qdiscs list is only used for populating the GUI, I wonder if we could just run the detection the first time we need it?

hnyman commented 4 years ago

Sounds reasonable. That would be well after the reboot in any case, so the possible system initialisation should be complete.

tapper82 commented 3 years ago

I can confirm I have this bug On my wrt3200acm after a fresh flash.

wulfy23 commented 3 years ago

possibly related...

edit: false alarm... issue was on my end... @ any script error or hang at .qos does not cleanup /tmp/run/sqm/sqm-run.lock/