ralphlange / procServ

Wrapper to start arbitrary interactive commands in the background, with telnet or Unix domain socket access to stdin/stdout
GNU General Public License v3.0
23 stars 23 forks source link

missing log files #38

Closed bfrk closed 4 years ago

bfrk commented 4 years ago

We have a machine that runs about 60 instances of procServ. I recently added creation of local log files to our setup. I restarted all instances a week ago and just now noted that a random selection of about 10 of them have missing log files, even though the instances are running. Here is an example:

iocsc1c:ctladm[13]~> systemctl status softIOC@sioc13c|less
● softIOC@sioc13c.service - Soft IOC sioc13c
   Loaded: loaded (/etc/systemd/system/softIOC@.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2020-08-10 16:16:43 CEST; 1 weeks 1 days ago
 Main PID: 9969 (runuser)
    Tasks: 2 (limit: 6144)
   CGroup: /system.slice/system-softIOC.slice/softIOC@sioc13c.service
           ├─9969 runuser -u ioc -- procServ --foreground --logfile=/var/log/softIOC/sioc13c.log --logstamp --timefmt=[%Y-%m-%d %H:%M:%S] --chdir=/opt/IOC/BII-Controls/links/SIOC13C/bin/linux-x86_64 --quiet --name=sioc13c --ignore=^D^C^] unix:/run/softIOC/sioc13c ./st.cmd.SIOC13C
           └─9980 procServ --foreground --logfile=/var/log/softIOC/sioc13c.log --logstamp --timefmt=[%Y-%m-%d %H:%M:%S] --chdir=/opt/IOC/BII-Controls/links/SIOC13C/bin/linux-x86_64 --quiet --name=sioc13c --ignore=^D^C^] unix:/run/softIOC/sioc13c ./st.cmd.SIOC13C
iocsc1c:ctladm[17]~> ls /var/log/softIOC/sioc13c.log
/bin/ls: cannot access '/var/log/softIOC/sioc13c.log': No such file or directory

Grepping the system logs did not reveal anything that sheds further light on the issue. Could it be some sort of race condition when many instances of procServ are started in parallel?

bfrk commented 4 years ago

Thomas was right in that the interaction with logrotate is what is failing here. I can definitely see procServ writing to a rotated log file instead of the "current" one. Perhaps our logrotate configuration is wrong. This is how we send the HUP signal to procServ:

  postrotate
    /bin/systemctl -q -s SIGHUP --kill-who=main kill "softIOC@$(basename $1 .log).service" || true
  endscript
bfrk commented 4 years ago

It seems that the postrotate script here is at fault, in particular passing --kill-who=main has no effect. Removing the option makes procServ properly re-open the log file.

Sigh. I have re-read the doumentation of the --kill-who option again and again, but I still fail to understand why specifying main is wrong here and all (the default) works.