papertrail / remote_syslog2

To install, see "Releases" tab. Self-contained daemon for reading local files and emitting remote syslog (without using local syslogd).
http://help.papertrailapp.com/
MIT License
637 stars 157 forks source link

remote_syslog appears to hang on FUTEX_WAIT on AWS AMI #210

Closed jeffmacdonald closed 6 years ago

jeffmacdonald commented 6 years ago

Hi.

I'm running remote_syslog2 on an AWS EB multi-container instance. The instance runs 2 copies of docker. Our apps log to stdout and files are saved to /var/log/containers/ourapp-wsgi/*.log and /var/log/containers/ourapp-celery-workers*.log

We find that after a few days, papertrail is no longer receiving our logs. Restarting remote_syslog2 on the app instances cause logs to start flowing again.

Here is some debugging I've tried to do.

[ec2-user@ip-10-0-2-251 log]$ ps aux|grep remote_syslog
ec2-user 20428  0.0  0.1 110456  2112 pts/0    S+   13:14   0:00 grep --color=auto remote_syslog
root     22924  0.0  0.6  39528 14000 ?        Sl   Sep21   0:21 /usr/local/bin/remote_syslog -c /etc/log_files.yml --pid-file=/var/run/remote_syslog.pid --poll=true
[ec2-user@ip-10-0-2-251 log]$ sudo strace -p 22924
Process 22924 attached
futex(0xa43cb0, FUTEX_WAIT, 0, NULL^CProcess 22924 detached
 <detached ...>
[ec2-user@ip-10-0-2-251 log]$ uname -a
Linux ip-10-0-2-251 4.9.43-17.38.amzn1.x86_64 #1 SMP Thu Aug 17 00:20:39 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
[ec2-user@ip-10-0-2-251 log]$ /usr/local/bin/remote_syslog -V
remote_syslog 0.19
[ec2-user@ip-10-0-2-251 log]$

So I dug a bit more with ps -efL|grep remote_syslog|wc and there were 132 results. Most looking something like this

root 22924 1 2215 0 131 Sep21 ? 00:00:03 /usr/local/bin/remote_syslog -c /etc/log_files.yml --pid-file=/var/run/remote_syslog.pid --poll=true

Performing a strace -f -p 2215 shows that the thread is "doing stuff" just fine.

And so.. now I ask for your help.

markdascher commented 6 years ago

If I'm understanding correctly, this looks like an issue with a rotated log file. Could you try out the latest beta, and see if it's fixed: v0.20-beta2

jeffmacdonald commented 6 years ago

Hi, just acking that I've read this. I'm travelling but I look forward to giving this a try when I'm back home

jeffmacdonald commented 6 years ago

I've deployed this and I'm still experiencing the same issues. I did some time co-relation

However: this behaviour is not consistent. Sometimes it doesn't hang.

jeffmacdonald commented 6 years ago

Doing a service remote_syslog stop; service remote_syslog start doesn't appear to help either. Additionally after I do start/stop it still sits on FUTEX_WAIT

markdascher commented 6 years ago

Actually, I now wonder if it's just getting hung up on some sort of open-file limitation. If you don't mind emailing into support@papertrailapp.com, we can get to the bottom of it a bit quicker. Then you can send over the full config, without it being public. (The output of lsof -p `pgrep remote_syslog`, and/or adding --debug-log-cfg /tmp/remote_syslog.log to the command-line, may tell us more.)

jeffmacdonald commented 6 years ago

Roger that! I'll follow up tomorrow. Thanks

markdascher commented 6 years ago

Solved issue by configuring stricter log file rotation, to prevent remote_syslog2 from bumping into inotify limits.