Closed dgsudharsan closed 1 year ago
@dgsudharsan could you please capture the difference in behavior across the two sonic version.
In 202211 when installing from ONIE, the telemetry process exits. However along with it the telemetry docker exits too since the telemetry process is defined as a critical process. In 202305 the telemetry docker however doesn't exit.
root@r-anaconda-51:/home/admin# docker exec telemetry bash -c '[ -f /etc/supervisor/critical_processes ] && cat /etc/supervisor/critical_processes'
program:telemetry
Reproduce the issue locally on 20230531.03 version.
After ONIE installation, telemetry process is exited indeed.
admin@sonic:/var/log$ docker exec telemetry supervisorctl status containercfgd RUNNING pid 16, uptime 0:34:55 dependent-startup EXITED Sep 20 07:45 AM dialout RUNNING pid 22, uptime 0:34:50 rsyslogd RUNNING pid 11, uptime 0:34:58 start EXITED Sep 20 07:45 AM supervisor-proc-exit-listener RUNNING pid 8, uptime 0:35:03 telemetry EXITED Sep 20 07:46 AM
Snippet telemetry.log:
Sep 20 07:45:57.973354 sonic INFO telemetry#supervisord: telemetry Traceback (most recent call last):
Sep 20 07:45:57.974320 sonic INFO telemetry#supervisord: telemetry File "/usr/local/bin/sonic-cfggen", line 452, in
investigating on a proper fix.
This is because telemetry service introduce the cert authentication but no telemetry config in Config DB.
127.0.0.1:6379[4]> keys TELEMETRY*
(empty array)
127.0.0.1:6379[4]>
Therefore, we need to manually load the TELEMETRY config into config DB:
telemetry.json
{
"TELEMETRY": {
"gnmi": {
"client_auth": "false",
"port": "50051",
"log_level": "2"
}
}
}
{
"TELEMETRY": {
"certs": {
"server_crt": "/etc/sonic/telemetry/streamingtelemetryserver.cer",
"server_key": "/etc/sonic/telemetry/streamingtelemetryserver.key",
"ca_crt": "/etc/sonic/telemetry/dsmsroot.cer"
},
"gnmi": {
"client_auth": "true",
"port": "50051",
"log_level": "2"
}
}
}
Load telemetry config into CONFIG DB:
sudo config load telemetry.json -y
Then, start telemetry process
docker exec telemetry supervisorctl start telemetry
After that, the above telemetry issue will be resolved. It requires a mechanism to generate a default TELEMETRY config into config db.
It still suggests to load customized TELEMETRY configs, if no TELEMETRY configuration in redis DB, after the fix, it will uses the default TELEMETRY configurations.
Description
After installing through onie, the telemetry process inside the telemetry container exits and sometimes its FATAL.
Sep 7 18:27:45.653772 r-anaconda-51 INFO telemetry#supervisord 2023-09-07 15:27:45,652 INFO exited: telemetry (exit status 0; not expected)
Steps to reproduce the issue:
Describe the results you received:
Telemetry process exits. However docker stays up even though its a critical process.
Describe the results you expected:
Telemetry main process should not exit. If it exits the docker should exit as well
Output of
show version
:Output of
show techsupport
:Additional information you deem important (e.g. issue happens only occasionally):
sonic_dump_r-bulldog-03_20230913_023753.tar.gz sonic_dump_r-anaconda-51_20230907_183233.tar.gz