treasure-data / omnibus-td-agent

td-agent (Fluentd) Packaging Scripts
https://docs.treasuredata.com/articles/td-agent-changelog
Apache License 2.0
82 stars 131 forks source link

td-agent crash not noticed by systemd #125

Open lobeck opened 7 years ago

lobeck commented 7 years ago

Running 2.3.4 on Xenial.

When fluentd dies for some reason, systemd won't notice it and therefore depending services like puppet can't get a proper state.

root@hubert:~# systemctl status td-agent
● td-agent.service - LSB: data collector for Treasure Data
   Loaded: loaded (/etc/init.d/td-agent; bad; vendor preset: enabled)
   Active: active (exited) since Thu 2017-04-27 09:58:12 UTC; 4h 29min ago
     Docs: man:systemd-sysv-generator(8)
    Tasks: 0
   Memory: 0B
      CPU: 0

Apr 27 09:58:11 hubert systemd[1]: Stopped LSB: data collector for Treasure Data.
Apr 27 09:58:11 hubert systemd[1]: Starting LSB: data collector for Treasure Data...
Apr 27 09:58:12 hubert td-agent[14199]: Starting td-agent:  * td-agent
Apr 27 09:58:12 hubert systemd[1]: Started LSB: data collector for Treasure Data.

root@hubert:~# /etc/init.d/td-agent status
● td-agent.service - LSB: data collector for Treasure Data
   Loaded: loaded (/etc/init.d/td-agent; bad; vendor preset: enabled)
   Active: active (exited) since Thu 2017-04-27 09:58:12 UTC; 4h 33min ago
     Docs: man:systemd-sysv-generator(8)
    Tasks: 0
   Memory: 0B
      CPU: 0

Apr 27 09:58:11 hubert systemd[1]: Stopped LSB: data collector for Treasure Data.
Apr 27 09:58:11 hubert systemd[1]: Starting LSB: data collector for Treasure Data...
Apr 27 09:58:12 hubert td-agent[14199]: Starting td-agent:  * td-agent
Apr 27 09:58:12 hubert systemd[1]: Started LSB: data collector for Treasure Data.
root@hubert:~# echo $?
0

root@hubert:~# ps auxf | grep td-agent
root     32336  0.0  0.0  14496  1088 pts/0    S+   14:30   0:00                      \_ grep --color=auto td-agent
root@hubert:~#
repeatedly commented 7 years ago

Is this td-agent side issue? How to fix it? I'm not familiar with systemd architecture, so I want to know what the approach is needed for this case. systemd configuration or something?

lobeck commented 7 years ago

Not sure yet, guess it's an issue caused by the lack of systemd support. It seems as the init script gets wrapped by systemd and never executed. It's internal state stays "active" which is then reported to all other tools as well.