Open abhiranjeet opened 3 years ago
Hi. Can you check /var/log/syslog for errors and fails? There you can find a informations, why dockers are failing.
Hi, I checked these logs
/usr/local/lib/python3.7/dist-packages/supervisor/options.py:474: UserWarning: Supervisord is running as root and it is searching for its configuration file in default locations (including its current working directory); you probably want to specify a "-c" argument specifying an absolute path to a configuration file for improved security.
'Supervisord is running as root and it is searching '
2021-06-13 05:59:48,566 INFO Included extra file "/etc/supervisor/conf.d/supervisord.conf" during parsing
2021-06-13 05:59:48,566 INFO Set uid to user 0 succeeded
2021-06-13 05:59:48,572 INFO RPC interface 'supervisor' initialized
2021-06-13 05:59:48,572 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2021-06-13 05:59:48,573 INFO supervisord started with pid 1
2021-06-13 05:59:49,576 INFO spawned: 'dependent-startup' with pid 9
2021-06-13 05:59:49,579 INFO spawned: 'supervisor-proc-exit-listener' with pid 10
2021-06-13 05:59:50,830 INFO success: dependent-startup entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-06-13 05:59:50,831 INFO success: supervisor-proc-exit-listener entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-06-13 05:59:50,841 INFO spawned: 'rsyslogd' with pid 13
2021-06-13 05:59:51,885 INFO success: rsyslogd entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-06-13 05:59:52,904 INFO spawned: 'start' with pid 17
2021-06-13 05:59:52,904 INFO success: start entered RUNNING state, process has stayed up for > than 0 seconds (startsecs)
2021-06-13 05:59:52,919 INFO exited: start (exit status 0; expected)
2021-06-13 05:59:52,934 INFO spawned: 'telemetry' with pid 19
2021-06-13 05:59:53,227 INFO exited: telemetry (exit status 255; not expected)
2021-06-13 05:59:54,231 INFO spawned: 'telemetry' with pid 44
2021-06-13 05:59:54,495 INFO exited: telemetry (exit status 255; not expected)
2021-06-13 05:59:56,516 INFO spawned: 'telemetry' with pid 69
2021-06-13 05:59:56,777 INFO exited: telemetry (exit status 255; not expected)
2021-06-13 05:59:59,798 INFO spawned: 'telemetry' with pid 94
2021-06-13 06:00:00,059 INFO exited: telemetry (exit status 255; not expected)
2021-06-13 06:00:01,061 INFO gave up: telemetry entered FATAL state, too many start retries too quickly
Does this help ?
Same issue here, except I'm running on a physical switch. Logs look pretty much the same. All of the docker images are available, but none are running except docker-database. Any ideas on how to debug this?
Yeah. Later on I built an image for an Edgecore switch with one change PLATFORM=broadcom
. You might see your database container running, but you have to cd into /usr/bin
on that switch and look for a script named database.sh
. Run that script using this command : ./database.sh start
Yarg. Didn't work for the Arista 7170 swi:
sudo ./database.sh start
Starting existing database container
database
Traceback (most recent call last):
File "/usr/local/bin/sonic-cfggen", line 431, in <module>
main()
File "/usr/local/bin/sonic-cfggen", line 326, in main
_process_json(args, data)
File "/usr/local/bin/sonic-cfggen", line 237, in _process_json
deep_update(data, FormatConverter.to_deserialized(json.load(stream)))
File "/usr/lib/python3.7/json/__init__.py", line 296, in load
parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
File "/usr/lib/python3.7/json/__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.7/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.7/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
True
I suspect that it is trying to parse /etc/sonic/config_db.json
which is empty for some reason on a fresh install. Is that file meant to be populated manually?
Is your switch one of these ?
Yep - the 7170-32.
Can you try to download and deploy one of those SONiC's image for Tofino ASIC? https://sonic-build.azurewebsites.net/ui/sonic/pipelines/146/builds?branchName=master
It look like my issue was unrelated to the OP's.
It turns out I am on 7170 32C - not 32CD. There were some issues with SKUs and port mapping names that were preventing things from loading properly. @Staphylo figured out what was going on and I'm up and running now.
I have built docker images from azure/sonic-buildimage repository with PLATFORM=vs on ubuntu server 18.04 LTS. The build is successful with creating images for all components in /target directory. After loading those .gz docker images, I use the "docker run" commands to start all sonic containers one by one. Some of those containers start and some exit. But the one's which are running, have no processes running inside. Sharing snapshots below.
/target
directorydocker images
docker ps -a
An example : ssh into
sonic-telemetry-vs
container and check processes running and supervisord logs