sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
734 stars 1.41k forks source link

rsyslog service restart during eventd service start causes failures #20544

Open stepanblyschak opened 3 days ago

stepanblyschak commented 3 days ago

Description

Rsyslog service restart by eventd service causes service start failures:

Steps to reproduce the issue:

  1. Reboot

Describe the results you received:

2024 Oct 15 00:33:28.682673 sonic NOTICE admin: Starting bgp service...
2024 Oct 15 00:33:28.712769 sonic NOTICE admin: Warm boot flag: bgp true.
2024 Oct 15 00:33:28.718818 sonic NOTICE admin: Fast boot flag: bgp .
2024 Oct 15 00:33:28.781090 sonic INFO systemd[1]: rsyslog.service: Deactivated successfully.
2024 Oct 15 00:33:28.781536 sonic INFO systemd[1]: Stopped rsyslog.service - System Logging Service.
2024 Oct 15 00:33:28.822264 sonic INFO systemd[1]: Starting rsyslog.service - System Logging Service...
2024 Oct 15 00:33:28.825426 sonic NOTICE systemd[1]: rsyslog.service: Main process exited, code=exited, status=1/FAILURE
2024 Oct 15 00:33:28.825539 sonic WARNING systemd[1]: rsyslog.service: Failed with result 'exit-code'.
2024 Oct 15 00:33:28.825905 sonic ERR systemd[1]: Failed to start rsyslog.service - System Logging Service.
2024 Oct 15 00:33:28.827440 sonic ERR systemctl[6191]: Job for rsyslog.service failed because the control process exited with error code.
2024 Oct 15 00:33:28.827512 sonic ERR systemctl[6191]: See "systemctl status rsyslog.service" and "journalctl -xeu rsyslog.service" for details.
2024 Oct 15 00:33:28.828924 sonic NOTICE systemd[1]: eventd.service: Control process exited, code=exited, status=1/FAILURE
2024 Oct 15 00:33:28.829152 sonic WARNING systemd[1]: eventd.service: Failed with result 'exit-code'.
2024 Oct 15 00:33:28.829769 sonic ERR systemd[1]: Failed to start eventd.service - EVENTD container.
2024 Oct 15 00:33:28.880062 sonic INFO bgp.sh[6707]: Starting existing bgp container with HWSKU Mellanox-SN4700-O8V48
2024 Oct 15 00:33:29.090871 sonic INFO systemd[1]: rsyslog.service: Scheduled restart job, restart counter is at 1.
2024 Oct 15 00:33:29.091220 sonic INFO systemd[1]: Stopped rsyslog.service - System Logging Service.

When doing "warm-reboot" after the switch has booted up there is a crash of eventd:

2024 Oct 15 00:46:53.199914 sonic NOTICE root: Stopping database service...
2024 Oct 15 00:46:53.231922 sonic NOTICE root: DEVICE_SUBTYPE: , CONFIG_KNOB:
2024 Oct 15 00:46:53.234147 sonic NOTICE root: Warm boot flag: database true.
2024 Oct 15 00:46:53.236090 sonic NOTICE root: Fast boot flag: database true.
2024 Oct 15 00:46:53.238770 sonic NOTICE root: Killing Docker database...
2024 Oct 15 00:46:53.285571 sonic INFO containerd[1275]: time="2024-10-15T00:46:53.285451683+03:00" level=info msg="shim disconnected" id=bfab5e784ebb87a07890b6761f46297ec7c94b5a8a245f4a7871
e1f61234d982
2024 Oct 15 00:46:53.285678 sonic INFO containerd[1275]: time="2024-10-15T00:46:53.285514300+03:00" level=warning msg="cleaning up after shim disconnected" id=bfab5e784ebb87a07890b6761f46297
ec7c94b5a8a245f4a7871e1f61234d982 namespace=moby
2024 Oct 15 00:46:53.285732 sonic INFO containerd[1275]: time="2024-10-15T00:46:53.285534422+03:00" level=info msg="cleaning up dead shim"
2024 Oct 15 00:46:53.285788 sonic INFO dockerd[1358]: time="2024-10-15T00:46:53.285562129+03:00" level=info msg="ignoring event" container=bfab5e784ebb87a07890b6761f46297ec7c94b5a8a245f4a787
1e1f61234d982 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
2024 Oct 15 00:46:53.294631 sonic INFO containerd[1275]: time="2024-10-15T00:46:53.294547337+03:00" level=warning msg="cleanup warnings time=\"2024-10-15T00:46:53+03:00\" level=info msg=\"st
arting signal loop\" namespace=moby pid=43027 runtime=io.containerd.runc.v2\n"
2024 Oct 15 00:46:53.305913 sonic INFO systemd[1]: var-lib-docker-overlay2-26686d5cfd186c57cfae80bef2ac2042c2c3870822b0022ed85f3141aa7aab1e-merged.mount: Deactivated successfully.
2024 Oct 15 00:46:53.309127 sonic INFO database.sh[43006]: database
2024 Oct 15 00:46:53.309423 sonic INFO database.sh[4774]: 137
2024 Oct 15 00:46:53.311147 sonic INFO systemd[1]: database.service: Deactivated successfully.
2024 Oct 15 00:46:53.311402 sonic INFO systemd[1]: Stopped database.service - Database container.
2024 Oct 15 00:46:53.311537 sonic NOTICE rsyslog_plugin: :- publish: EVENT_PUBLISHED: {"sonic-events-host:event-stopped-ctr":{"ctr_name":"Database","timestamp":"2024-10-15T00:46:53.311471Z"}
}
2024 Oct 15 00:46:53.312052 sonic INFO dockerd[1358]: time="2024-10-15T00:46:53.311974412+03:00" level=info msg="Processing signal 'terminated'"
2024 Oct 15 00:46:53.312112 sonic INFO systemd[1]: Stopping docker.service - Docker Application Container Engine...
2024 Oct 15 00:46:53.314406 sonic INFO eventd#eventd#supervisord: eventd terminate called after throwing an instance of 'swss::RedisError'
2024 Oct 15 00:46:53.314406 sonic INFO eventd#eventd#supervisord: eventd   what():  RedisError: Failed to redisGetReply in RedisPipeline::pop, err=3: errstr=Server closed the connection
2024 Oct 15 00:46:53.867509 sonic INFO eventd#eventd#supervisord 2024-10-14 21:46:53,865 WARN exited: eventd (terminated by SIGABRT (core dumped); not expected)
2024 Oct 15 00:46:53.867509 sonic INFO eventd#eventd#supervisord 2024-10-14 21:46:53,866 WARN received SIGTERM indicating exit request

Describe the results you expected:

No errors during system start.

Output of show version:

SONiC Software Version: SONiC.202405_RC.31-d2ba308bd_Internal
SONiC OS Version: 12
Distribution: Debian 12.7
Kernel: 6.1.0-22-2-amd64
Build commit: d2ba308bd
Build date: Sun Oct 13 14:15:01 UTC 2024
Built by: sw-r2d2-bot@r-build-sonic-ci03-241

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

zbud-msft commented 12 hours ago

Looking into this issue