Closed hamelg closed 3 years ago
was also seeing this... search the forum... implemented local fix...
i think it has something to do with manually restarting collectd or luci_statitics or something...
As @wulfy23 mentioned this may be related. I think we need to modify the script and install a signal handler that terminates the script on a SIGTERM.
Have a look at this shell pattern to react to a signal after a sleep: https://github.com/openwrt/packages/blob/e36a65459a55e9bbf78d94a41ea93caa17f49779/net/mwan3/files/usr/sbin/mwan3track#L386-L389
Have a look at this shell pattern to install a signal handler: https://github.com/openwrt/packages/blob/e36a65459a55e9bbf78d94a41ea93caa17f49779/net/mwan3/files/usr/sbin/mwan3track#L209-L211
fwiw this is my hack... (reap other instances on call)
for mPID in $(pgrep -f '/usr/libexec/collectd/sqm_collectd.sh'); do
[ "$mPID" = "$$" ] && continue
kill -9 "$mPID"
done
cc @ldir-EDB0
This is quite strange and I've not seen this behaviour on my system, however I can replicate it by killing collectd in a forceful way eg kill -9. That means it doesn't get a chance to signal its children and hence the child process still runs even though it's an orphan.
Breaking the infinite 'while true' loop seems a sensible thing to do, replacing with something akin to 'while not an orphan ; do'. I'll think about that.
Instead of while true:
while [ $(awk '$1 ~ "^PPid:" {print $2}' /proc/$$/status) -ne 1 ] ; do
https://github.com/openwrt/packages/pull/16770
tested and functional
cheers
[ /usbstick 53°] ps w | grep collectd
17107 root 6076 SN /usr/sbin/collectd -C /tmp/collectd.conf -f
17226 nobody 1416 SN /bin/sh /usr/libexec/collectd/sqm_collectd.sh eth1
23370 root 1240 S grep collectd
[ /usbstick 53°] kill -9 17107
[ /usbstick 54°] ps w | grep collectd
17226 nobody 1416 SN /bin/sh /usr/libexec/collectd/sqm_collectd.sh eth1
23380 root 1240 S grep collectd
[ /usbstick 54°] ps w | grep collectd
17226 nobody 1416 SN /bin/sh /usr/libexec/collectd/sqm_collectd.sh eth1
23385 root 5988 SN /usr/sbin/collectd -C /tmp/collectd.conf -f
23399 nobody 1316 SN /bin/sh /usr/libexec/collectd/sqm_collectd.sh eth1
23407 root 1240 S grep collectd
[ /usbstick 53°] ps w | grep collectd
23385 root 5988 SN /usr/sbin/collectd -C /tmp/collectd.conf -f
23399 nobody 1316 SN /bin/sh /usr/libexec/collectd/sqm_collectd.sh eth1
23414 root 1240 S grep collectd
Maintainer: Jo-Philipp Wich jo@mein.io, Hannu Nyman hannu.nyman@iki.fi Environment: Openwrt 19.07
Description: After a long uptime, I notice there are multiple instance of the sqm_collectd.sh script running. All duplicated instances are orphans.
My exec module configuration
the top output shows the duplicated instances
A possible workaround could be to break the forever loop after xx iterations.