sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
740 stars 1.43k forks source link

[chassis][supervisor] [master]database-chassis.service failed to start at reboot on Supervisor #20715

Open mlok-nokia opened 1 week ago

mlok-nokia commented 1 week ago

Description

On master branch, database-chassis.service start failed at boot up on Supervisor. The following syslog shows the failure info

2024-11-01T15:17:25.148322+00:00 sonic systemd[1]: database-chassis.service: Control process exited, code=exited, status=1/FAILURE
2024-11-01T15:17:25.148322+00:00 sonic systemd[1]: database-chassis.service: Control process exited, code=exited, status=1/FAILURE
2024-11-01T15:17:25.148518+00:00 sonic systemd[1]: database-chassis.service: Failed with result 'exit-code'.
2024-11-01T15:17:25.148518+00:00 sonic systemd[1]: database-chassis.service: Failed with result 'exit-code'.
2024-11-01T15:17:25.149312+00:00 sonic systemd[1]: Failed to start database-chassis.service - database-chassis container.
2024-11-01T15:17:25.149312+00:00 sonic systemd[1]: Failed to start database-chassis.service - database-chassis container.
2024-11-01T15:17:25.149678+00:00 sonic systemd[1]: Dependency failed for config-topology.service - Platform topology configuration service.
2024-11-01T15:17:25.149734+00:00 sonic systemd[1]: Dependency failed for config-setup.service - Config initialization and migration service.
2024-11-01T15:17:25.149678+00:00 sonic systemd[1]: Dependency failed for config-topology.service - Platform topology configuration service.
2024-11-01T15:17:25.149734+00:00 sonic systemd[1]: Dependency failed for config-setup.service - Config initialization and migration service.
2024-11-01T15:17:25.149782+00:00 sonic systemd[1]: Dependency failed for swss@14.service - switch state service.
2024-11-01T15:17:25.149782+00:00 sonic systemd[1]: Dependency failed for swss@14.service - switch state service.

Steps to reproduce the issue:

  1. Just reboot supervisor with the latest master image, and check the syslog. The following syslog
    2024-11-01T15:17:25.148322+00:00 sonic systemd[1]: database-chassis.service: Control process exited, code=exited, status=1/FAILURE
    2024-11-01T15:17:25.148322+00:00 sonic systemd[1]: database-chassis.service: Control process exited, code=exited, status=1/FAILURE
    2024-11-01T15:17:25.148518+00:00 sonic systemd[1]: database-chassis.service: Failed with result 'exit-code'.
    2024-11-01T15:17:25.148518+00:00 sonic systemd[1]: database-chassis.service: Failed with result 'exit-code'.
    2024-11-01T15:17:25.149312+00:00 sonic systemd[1]: Failed to start database-chassis.service - database-chassis container.
    2024-11-01T15:17:25.149312+00:00 sonic systemd[1]: Failed to start database-chassis.service - database-chassis container.
    2024-11-01T15:17:25.149678+00:00 sonic systemd[1]: Dependency failed for config-topology.service - Platform topology configuration service.
    2024-11-01T15:17:25.149734+00:00 sonic systemd[1]: Dependency failed for config-setup.service - Config initialization and migration service.
    2024-11-01T15:17:25.149678+00:00 sonic systemd[1]: Dependency failed for config-topology.service - Platform topology configuration service.
    2024-11-01T15:17:25.149734+00:00 sonic systemd[1]: Dependency failed for config-setup.service - Config initialization and migration service.
    2024-11-01T15:17:25.149782+00:00 sonic systemd[1]: Dependency failed for swss@14.service - switch state service.
    2024-11-01T15:17:25.149782+00:00 sonic systemd[1]: Dependency failed for swss@14.service - switch state service.

Describe the results you received:

database-chasis.service failed to start at reboot

Describe the results you expected:

It should not be failed.

Output of show version:

(paste your output here)

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

arlakshm commented 1 week ago

@mlok-nokia to add more error logs.

anamehra commented 1 week ago

Hi @rlhui , as discussed in community meeting, Cisco SIM sanities are also failing with latest master code. While going though the recent commits, I came across https://github.com/sonic-net/sonic-buildimage/pull/19016 Reverting this in our run passed SIM sanity. I will validate this on h/w as well.

Hi @mlok-nokia , could you please try with this PR reverted?

anamehra commented 1 week ago

We also validated the build successfully on Cisco h/w with #19016 reverted.

arlakshm commented 5 days ago

https://github.com/sonic-net/sonic-buildimage/pull/20726 might have the potential fix. Please test with this change

mlok-nokia commented 3 days ago

20726 might have the potential fix. Please test with this change

@arlakshm This PR fixes the /etc/supervisor/critical_processes file in "database" container. It does not fix the database-chassis.service issue.