Closed arnetheduck closed 3 years ago
Can also disable writing to systemd journal, there's a lot of logs and one copy is enough
We apparently also push same logs to Systemd journal, or at least rsyslog
seems to.
As far as I can tell logs appear in systemd journal because dockerd
prints them:
admin@devel-small-01.aws-eu-central-1a.nimbus.pyrmont:~ % sudo journalctl -u docker -n 1 --no-pager
-- Logs begin at Tue 2020-11-24 09:12:29 UTC, end at Tue 2020-11-24 09:54:03 UTC. --
Nov 24 09:54:03 devel-small-01.aws-eu-central-1a.nimbus.pyrmont docker/beacon-node-pyrmont-devel-small[583]: {"lvl":"DBG","ts":"2020-11-24 09:54:03.157+00:00","msg":"Attestation received","tid":1,"file":"eth2_processor.nim:313","delay":{"value":32157052613},"wallSlot":42569,"attestation":{"aggregation_bits":"0x0000000000200000000000000000000001","data":{"slot":42567,"index":22,"beacon_block_root":"16986f5e","source":{"epoch":1329,"root":"7d33851c"},"target":{"epoch":1330,"root":"25a5dd27"}},"signature":"b82b4eb7"},"committeeIndex":5}
So the issue isn't with configuration of rsyslog
.
I don't really see what can be done to stop it from doing that:
> dockerd -h | grep log
--log-driver string Default driver for container logs (default "json-file")
-l, --log-level string Set the logging level ("debug"|"info"|"warn"|"error"|"fatal") (default
--log-opt map Default log driver options for containers (default map[])
--raw-logs Full timestamps without ANSI coloring
admin@devel-small-01.aws-eu-central-1a.nimbus.pyrmont:~ %
Changing log level doesn't seem to change anything.
Opened and issue, let's see what they say: https://github.com/docker/for-linux/issues/1151
One more thing - around midnight right now, there's a huge processing spike that disturbs NBC - https://metrics.status.im/d/pgeNfj2Wz23/nimbus-fleet-testnets?orgId=1&from=1606330725473&to=1606362552381&var-instance=devel-large-02.aws-eu-central-1a.nimbus.pyrmont - I'm guessing this might be the log compression - if it's not done already, the compaction should probably be nice
:d
I have identified why the logs also go to systemd journal: https://github.com/docker/for-linux/issues/1151#issuecomment-734351866 Source: https://www.rsyslog.com/doc/master/configuration/modules/imuxsock.html?highlight=socket#coexistence-with-systemd
Long story short, systemd journal controls /dev/log
socket and also pushes logs to /run/systemd/journal/syslog
from which Rsyslog reads, which is why logs first appear in the journal.
A solution could be specifying syslog-address
to point to unixgram:///run/systemd/journal/syslog
.
I have implemented sending logs by default to rsyslog
in https://github.com/status-im/infra-role-bootstrap/commit/2356b0e4.
I've deployed the change to all Nimbus nodes. This should severely lower the frequency of log compression by systemd. Important thing to note is that for the container to be affected by this Docker config change it needs to be re-created. I have yet to verify if Watchtower image updates will make this take effect, but they probably will.
I confirmed on devel-large-01.aws-eu-central-1a.nimbus.pyrmont
that Watchtower image update does re-create the container in such a way as is necessary for the syslog-address
configuration to take effect.
So we currently already have log rotation set to hourly
:
https://github.com/status-im/infra-nimbus/blob/176d843a4002155c171fe19815252b5a4300d216/ansible/group_vars/all.yml#L19
But this does not seem to have effect. By default logrotate
is ran daily via /etc/cron.daily/logrotate
:
> grep /usr/sbin/logrotate /etc/cron.daily/logrotate
if [ ! -x /usr/sbin/logrotate ]; then
/usr/sbin/logrotate /etc/logrotate.conf
Which we copy into /etc/cron.hourly
here:
https://github.com/status-im/infra-role-bootstrap/blob/2356b0e4/tasks/logging/main.yml#L50-L57
Which does work:
admin@devel-large-01.aws-eu-central-1a.nimbus.pyrmont:~ % ls -l /etc/cron.hourly/logrotate
-rwxr-xr-x 1 root root 377 Jan 21 2019 /etc/cron.hourly/logrotate
And we can see that it should run when tested with run-parts
:
admin@devel-large-01.aws-eu-central-1a.nimbus.pyrmont:~ % sudo run-parts --report --test /etc/cron.hourly
/etc/cron.hourly/logrotate
So for all intents and purposes it should run hourly.
Oooooooooooooooooh:
admin@devel-large-01.aws-eu-central-1a.nimbus.pyrmont:~ % head -n6 /etc/cron.hourly/logrotate
#!/bin/sh
# skip in favour of systemd timer
if [ -d /run/systemd/system ]; then
exit 0
fi
admin@devel-large-01.aws-eu-central-1a.nimbus.pyrmont:~ % ls -l /run/systemd/system
total 4
-rw-r--r-- 1 root root 299 Nov 20 13:22 netplan-ovs-cleanup.service
drwxr-xr-x 2 root root 60 Nov 20 13:22 systemd-networkd.service.wants
So it looks like it gets skipped because this exists:
% s list-timers -a logrotate
NEXT LEFT LAST PASSED UNIT ACTIVATES
Fri 2020-11-27 00:00:00 UTC 7h left Thu 2020-11-26 00:00:03 UTC 16h ago logrotate.timer logrotate.servic
% s status logrotate.timer
● logrotate.timer - Daily rotation of log files
Loaded: loaded (/lib/systemd/system/logrotate.timer; enabled; vendor preset: enabled)
Active: active (waiting) since Fri 2020-11-20 12:55:03 UTC; 6 days ago
Trigger: Fri 2020-11-27 00:00:00 UTC; 7h left
Triggers: ● logrotate.service
Docs: man:logrotate(8)
man:logrotate.conf(5)
And the config is...
[Unit]
Description=Daily rotation of log files
Documentation=man:logrotate(8) man:logrotate.conf(5)
[Timer]
OnCalendar=daily
AccuracySec=12h
Persistent=true
[Install]
WantedBy=timers.target
Daily...
Which means that that copy of script into /etc/cron.hourly
has no effect.
I made it so that /lib/systemd/system/logrotate.timer
is modified if we use a custom log rotation frequency:
https://github.com/status-im/infra-role-bootstrap/commit/b9ab08df
I also bumped limit of files to keep to 24: https://github.com/status-im/infra-nimbus/commit/796c7ce2
I did a test by defining the simplest Rsyslog template possible:
$template rawMsg,"%msg%\n"
But when I used it I'm getting a single space in front of the log:
% tail -n1 /var/log/docker/beacon-node-mainnet-master-small/docker.log
{"lvl":"DBG","ts":"2020-11-26 17:40:25.988+00:00","msg":"Closed stream","topics":"lpstream","tid":1,"file":"lpstream.nim:257","s":"5fbfe889d1359f4f12fa4a31","objName":"LPChannel","dir":"In"}
Oh, that's weird but it needs to be:
$template rawMsg,"%msg:2:2048%\n"
Based on:
it is recommended to use:
$template myFormat,"%msg:2:2048%\n"
https://stackoverflow.com/a/60174111
And that fixes it:
% tail -n1 /var/log/docker/beacon-node-mainnet-master-small/docker.log
{"lvl":"DBG","ts":"2020-11-26 17:43:37.513+00:00","msg":"Stream created","topics":"lpstream","tid":1,"file":"lpstream.nim:128","s":"5fbfe949d1359f4f12fa5374","objName":"ChronosStream","dir":"Out"}
Default rsyslog tempaltes can be found here:
https://www.rsyslog.com/doc/v8-stable/configuration/templates.html?highlight=rsyslog_traditionalfileformat#reserved-template-names
The one v8 uses by default is RSYSLOG_TraditionalFileFormat
.
I made format customizable in https://github.com/status-im/infra-role-bootstrap/commit/1de7aaac and set it to %msg:2:2048%\n
in https://github.com/status-im/infra-nimbus/commit/996e2df2.
Looking good:
admin@master-small-02.aws-eu-central-1a.nimbus.mainnet:~ % tail -n5 /var/log/docker/beacon-node-mainnet-master-small/docker.log
{"lvl":"DBG","ts":"2020-11-26 18:08:31.845+00:00","msg":"Peer disconnected","topics":"networking","tid":1,"file":"eth2_network.nim:925","peer":"16Uiu2HAmQP7HztnkrmRwa28U1T7BNhXRkoHLRitt12y9JuUchjr2","connections":0}
{"lvl":"DBG","ts":"2020-11-26 18:08:31.845+00:00","msg":"unsubscribing pubsub peer","topics":"pubsub","tid":1,"file":"pubsub.nim:80","peerId":"16*Uchjr2"}
{"lvl":"DBG","ts":"2020-11-26 18:08:31.886+00:00","msg":"Peer status","topics":"sync","tid":1,"file":"sync_protocol.nim:208","peer":"16*r3QFJb","statusMsg":{"forkDigest":"b5303f2a","finalizedRoot":"00000000","finalizedEpoch":0,"headRoot":"4d611d5b","headSlot":0}}
{"lvl":"DBG","ts":"2020-11-26 18:08:31.886+00:00","msg":"Closed stream","topics":"lpstream","tid":1,"file":"lpstream.nim:257","s":"5fbfef1fd1359f4f12fa5a0a","objName":"LPChannel","dir":"In"}
{"lvl":"DBG","ts":"2020-11-26 18:08:31.957+00:00","msg":"Stream created","topics":"lpstream","tid":1,"file":"lpstream.nim:128","s":"5fbfef1fd1359f4f12fa5a0d","objName":"SecureConn","dir":"Out"}
nice! look at that storage graph as well, nuts: https://grafana.infra.status.im/d/QCTZ8-Vmk/single-host-dashboard?viewPanel=22&orgId=1&from=now-24h&to=now&refresh=10s&var-host=libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont
Yeah, I had no idea this stuff was going through systemd. You learn something new every day and yet you still die stupid.
I guess it's done.
I also bumped limit of files to keep to 24: 796c7ce
can we get 72 here? ie 3 days? assuming it's really hourly.
Looks hourly so far:
admin@master-small-02.aws-eu-central-1a.nimbus.mainnet:/docker/log/beacon-node-mainnet-master-small % ll
total 114M
-rw-r--r-- 1 syslog adm 16M Nov 26 20:27 docker.log
-rw-r--r-- 1 syslog adm 39M Nov 26 20:00 docker.log.1
-rw-r--r-- 1 syslog adm 2.5M Nov 26 19:00 docker.log.2.gz
-rw-r--r-- 1 syslog adm 1.5M Nov 26 18:00 docker.log.3.gz
-rw-r--r-- 1 syslog adm 56M Nov 26 17:24 docker.log.4.gz
can we get 72 here?
Sure we can: https://github.com/status-im/infra-nimbus/commit/65f0d944
For the sake of posterity:
Currently running on daily rotation, 5 days:
Ideally, also get rid of prefix (
Nov 20 10:11:30 pyrmont-06.aws-eu-central-1a.nimbus.test docker/beacon-node-pyrmont-devel[600]:
), it makes the logs non-json, hard to read and unnecessarily large