sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
730 stars 1.4k forks source link

[logs][tunnel] Logs show "Could not get tunnel addresses from config DB" #9633

Open alexrallen opened 2 years ago

alexrallen commented 2 years ago

Description

Receiving error log during switch boot.

Steps to reproduce the issue:

  1. Install SONiC latest master via ONIE
  2. Boot switch and check log

Describe the results you received:

tunnel_packet_handler repeatedly crashes and restarts due to this.

Dec 22 04:27:06.962107 r-bulldog-03 NOTICE swss#tunnel_packet_handler.py: Could not get tunnel addresses from config DB, exiting...
Dec 22 04:27:07.042777 r-bulldog-03 INFO swss#supervisord 2021-12-22 04:27:07,041 INFO exited: tunnel_packet_handler (exit status 0; not expected)

Describe the results you expected:

No error log.

Output of show version:

SONiC Software Version: SONiC.master.244-3aec72879_Internal
Distribution: Debian 11.2
Kernel: 5.10.0-8-2-amd64
Build commit: 3aec72879
Build date: Wed Dec 22 08:30:03 UTC 2021
Built by: sw-r2d2-bot@r-build-sonic-ci02-241

Platform: x86_64-mlnx_msn3420-r0
HwSKU: ACS-MSN3420
ASIC: mellanox
ASIC Count: 1
Serial Number: MT2019X13878
Model Number: MSN3420-CB2FO
Hardware Revision: A1
Uptime: 22:05:10 up  5:42,  5 users,  load average: 1.11, 1.01, 0.78

Docker images:
REPOSITORY                                         TAG                             IMAGE ID       SIZE
docker-platform-monitor                            latest                          3179b3bfa4a2   809MB
docker-platform-monitor                            master.244-3aec72879_Internal   3179b3bfa4a2   809MB
docker-teamd                                       latest                          2f47d005bb95   436MB
docker-teamd                                       master.244-3aec72879_Internal   2f47d005bb95   436MB
docker-syncd-mlnx                                  latest                          cc4d31e33915   1.01GB
docker-syncd-mlnx                                  master.244-3aec72879_Internal   cc4d31e33915   1.01GB
docker-orchagent                                   latest                          5c891ff8f214   455MB
docker-orchagent                                   master.244-3aec72879_Internal   5c891ff8f214   455MB
docker-dhcp-relay                                  latest                          c646bc83cc8c   436MB
docker-sonic-telemetry                             latest                          606f26ae194f   511MB
docker-sonic-telemetry                             master.244-3aec72879_Internal   606f26ae194f   511MB
docker-sonic-mgmt-framework                        latest                          7c86fd946748   578MB
docker-sonic-mgmt-framework                        master.244-3aec72879_Internal   7c86fd946748   578MB
docker-snmp                                        latest                          546605aaf36c   465MB
docker-snmp                                        master.244-3aec72879_Internal   546605aaf36c   465MB
docker-sflow                                       latest                          68f7233b21a9   436MB
docker-sflow                                       master.244-3aec72879_Internal   68f7233b21a9   436MB
docker-router-advertiser                           latest                          dafc3d1fb775   423MB
docker-router-advertiser                           master.244-3aec72879_Internal   dafc3d1fb775   423MB
docker-nat                                         latest                          47bf92fce979   438MB
docker-nat                                         master.244-3aec72879_Internal   47bf92fce979   438MB
docker-mux                                         latest                          b68969e05d75   475MB
docker-mux                                         master.244-3aec72879_Internal   b68969e05d75   475MB
docker-macsec                                      latest                          acb05bec6969   439MB
docker-macsec                                      master.244-3aec72879_Internal   acb05bec6969   439MB
docker-lldp                                        latest                          85f1ea2eb1a0   463MB
docker-lldp                                        master.244-3aec72879_Internal   85f1ea2eb1a0   463MB
docker-fpm-frr                                     latest                          ebb395f4d389   454MB
docker-fpm-frr                                     master.244-3aec72879_Internal   ebb395f4d389   454MB
docker-database                                    latest                          89d24c1c4f64   423MB
docker-database                                    master.244-3aec72879_Internal   89d24c1c4f64   423MB
urm.nvidia.com/sw-nbu-sws-sonic-docker/sonic-wjh   1.0.0-master-internal-18        2290464c0e59   468MB
harbor.mellanox.com/sonic/cpu-report               10.0.0                          5314b41a2a5e   413MB
zhangyanzhao commented 2 years ago

@TACappleman would you please help to take a look? Thanks.

TACappleman commented 2 years ago

That script appears to be a new one (see https://github.com/Azure/sonic-buildimage/commit/7bd0a2ad11c96325cb69f5a463fa0d29cdaf134c#diff-0ce422261770900e0d7df3c6bbcf7fc7a625472773b2a505c66e2adedce86fc2). It looks like it tries to run even when no IPinIP tunnels are in use, and will exit in that case.

How often does it then restart?

alexrallen commented 2 years ago

We see it a few times (~4-5) during boot @TACappleman

zhangyanzhao commented 2 years ago

Lawrence, please help to take a look. Thanks.

dgsudharsan commented 2 years ago

@theasianpianist @prsunny @TACappleman any updates on this issue?

theasianpianist commented 2 years ago

@dgsudharsan @alexrallen are you seeing any impact to SONiC functionality? This is the expected behavior for this service running on non dual ToR devices, but it should not affect anything else. It might be possible to have the service exit gracefully in this scenario.

dgsudharsan commented 1 year ago

@theasianpianist No functional impact. But an error syslog at production might alarm the syslog monitoring systems and its better to have the error removed. Do you have any updates on this issue?

theasianpianist commented 1 year ago

@theasianpianist No functional impact. But an error syslog at production might alarm the syslog monitoring systems and its better to have the error removed. Do you have any updates on this issue?

I've checked on both the latest master and latest 202012 image, supervisord logs that the process exit is expected:

INFO swss#supervisord 2023-05-04 18:37:29,238 INFO exited: tunnel_packet_handler (exit status 0; expected)

As far as I'm aware this log won't cause any issues since it's INFO level and since it explicitly states that the exit is expected.