sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
735 stars 1.41k forks source link

[warm-boot] vxlanmgrd errors seen during warmboot with control plane assistant #7669

Closed vaibhavhd closed 2 years ago

vaibhavhd commented 3 years ago

Description

Warmreboot with controlplane assistant throws errors - "vxlanmgrd Error: argument "neighbor_advertiser-1000" is wrong: "dev" not a valid ifname"

Steps to reproduce the issue:

  1. Execute warmboot with control-plane-assistant address (or run test_wr_arp test)
  2. The test may or may not fail with error Timed out waiting for warm reboot
  3. Check syslog, and there are a few errors related to vxlan.

Describe the results you received:

Errors before warmboot:

May 13 13:01:06.164969 str-dx010-acs-4 NOTICE admin: Saving counters folder before warmboot...
May 13 13:01:10.640075 str-dx010-acs-4 NOTICE admin: Setting up control plane assistant: 10.64.246.59 ...

May 13 13:01:11.031240 str-dx010-acs-4 WARNING neighbor_advertiser: :- operator(): Key 'SWITCH_TABLE:switch' field 'vxlan_port' unavailable in database 'APPL_DB'
May 13 13:01:11.073729 str-dx010-acs-4 NOTICE swss#vxlanmgrd: :- doVxlanTunnelCreateTask: Create vxlan tunnel neighbor_advertiser
May 13 13:01:11.074252 str-dx010-acs-4 NOTICE swss#orchagent: :- addOperation: Vxlan tunnel 'neighbor_advertiser' was added
May 13 13:01:11.087516 str-dx010-acs-4 INFO swss#/supervisord: vxlanmgrd Error: argument "neighbor_advertiser-1000" is wrong: "dev" not a valid ifname
May 13 13:01:11.087652 str-dx010-acs-4 WARNING swss#vxlanmgrd: :- doVxlanTunnelMapCreateTask: Vxlan Net Dev creation failure for neighbor_advertiser VNI(1000) VLAN(1000)

May 13 13:01:11.341990 str-dx010-acs-4 NOTICE swss#orchagent: :- addOperation: Vxlan tunnel map entry 'map_1' for tunnel 'neighbor_advertiser' was created
May 13 13:01:11.342678 str-dx010-acs-4 NOTICE swss#orchagent: :- createEntry: Created mirror session neighbor_advertiser
May 13 13:01:11.367156 str-dx010-acs-4 NOTICE swss#orchagent: :- activateSession: Activated mirror session neighbor_advertiser

May 13 13:02:03.994091 str-dx010-acs-4 NOTICE admin: Rebooting with /sbin/kexec -e to SONiC-OS-HEAD.741-129f803e ...

Errors after warmboot:

May 13 13:02:45.996341 str-dx010-acs-4 NOTICE swss#vxlanmgrd: :- main: --- Starting vxlanmgrd ---
May 13 13:02:46.003752 str-dx010-acs-4 NOTICE swss#vxlanmgrd: :- checkWarmStart: vxlanmgrd doing warm start, restore count 1
May 13 13:02:46.005514 str-dx010-acs-4 NOTICE swss#vxlanmgrd: :- setWarmStartState: vxlanmgrd warm start state changed to initialized

May 13 13:02:57.019138 str-dx010-acs-4 NOTICE swss#orchagent: :- bake: Found mirror session neighbor_advertiser active before warm reboot
May 13 13:02:57.030507 str-dx010-acs-4 NOTICE swss#orchagent: :- bake: Add warm input: VXLAN_TUNNEL_TABLE, 1
May 13 13:02:57.030759 str-dx010-acs-4 NOTICE swss#orchagent: :- bake: Add warm input: VXLAN_EVPN_NVO_TABLE, 0
May 13 13:02:57.030899 str-dx010-acs-4 NOTICE swss#orchagent: :- bake: Add warm input: VXLAN_TUNNEL_MAP_TABLE, 1
May 13 13:02:57.034521 str-dx010-acs-4 NOTICE swss#orchagent: :- bake: Add warm input: VXLAN_REMOTE_VNI_TABLE, 0
May 13 13:02:57.035505 str-dx010-acs-4 NOTICE swss#orchagent: :- bake: Add warm input: VXLAN_VRF_TABLE, 0

May 13 13:02:57.154026 str-dx010-acs-4 INFO swss#/supervisord: vxlanmgrd Error: argument "neighbor_advertiser-1000" is wrong: "dev" not a valid ifname
May 13 13:02:57.157118 str-dx010-acs-4 WARNING swss#vxlanmgrd: :- restoreVxlanNetDevices: Vxlan Net Dev creation failure for neighbor_advertiser VNI(1000) VLAN(1000)
May 13 13:02:57.171415 str-dx010-acs-4 NOTICE swss#vxlanmgrd: :- setWarmStartState: vxlanmgrd warm start state changed to replayed

May 13 13:02:57.171667 str-dx010-acs-4 NOTICE swss#vxlanmgrd: :- main: starting main loop
May 13 13:02:57.171842 str-dx010-acs-4 NOTICE swss#vxlanmgrd: :- doVxlanTunnelCreateTask: Create vxlan tunnel neighbor_advertiser
May 13 13:02:57.191582 str-dx010-acs-4 INFO swss#/supervisord: vxlanmgrd Error: argument "neighbor_advertiser-1000" is wrong: "dev" not a valid ifname
May 13 13:02:57.192517 str-dx010-acs-4 WARNING swss#vxlanmgrd: :- doVxlanTunnelMapCreateTask: Vxlan Net Dev creation failure for neighbor_advertiser VNI(1000) VLAN(1000)
May 13 13:02:58.193516 str-dx010-acs-4 NOTICE swss#vxlanmgrd: :- setWarmStartState: vxlanmgrd warm start state changed to reconciled

May 13 13:02:59.030708 str-dx010-acs-4 NOTICE swss#orchagent: :- addOperation: Vxlan tunnel 'neighbor_advertiser' was added
May 13 13:02:59.031054 str-dx010-acs-4 WARNING swss#orchagent: :- addOperation: Vxlan tunnel map vlan id doesn't exist: 1000
May 13 13:03:02.374828 str-dx010-acs-4 NOTICE swss#orchagent: :- addOperation: Vxlan tunnel map entry 'map_1' for tunnel 'neighbor_advertiser' was created
May 13 13:03:10.421459 str-dx010-acs-4 ERR swss#orchagent: :- create: Mirror rule references mirror session "neighbor_advertiser" that does not exist yet
May 13 13:03:10.421658 str-dx010-acs-4 ERR swss#orchagent: :- create: Mirror rule references mirror session "neighbor_advertiser" that does not exist yet
May 13 13:03:10.421717 str-dx010-acs-4 NOTICE swss#orchagent: :- createEntry: Created mirror session neighbor_advertiser

May 13 13:03:19.488186 str-dx010-acs-4 ERR swss#orchagent: :- addOperation: Vxlan tunnel 'neighbor_advertiser' is already exists

May 13 13:04:42.326072 str-dx010-acs-4 NOTICE root: WARMBOOT_FINALIZER : Tearing down control plane assistant ...
May 13 13:04:45.717415 str-dx010-acs-4 NOTICE swss#orchagent: :- deactivateSession: Deactivated mirror session neighbor_advertiser
May 13 13:04:45.717752 str-dx010-acs-4 NOTICE swss#orchagent: :- deleteEntry: Removed mirror session neighbor_advertiser
May 13 13:04:45.726589 str-dx010-acs-4 INFO swss#/supervisord: vxlanmgrd Error: argument "neighbor_advertiser-1000" is wrong: "dev" not a valid ifname
May 13 13:04:45.728662 str-dx010-acs-4 NOTICE swss#vxlanmgrd: :- doVxlanTunnelDeleteTask: Delete vxlan tunnel neighbor_advertiser
May 13 13:04:45.750240 str-dx010-acs-4 NOTICE swss#orchagent: :- delOperation: vni count = 0
May 13 13:04:45.761940 str-dx010-acs-4 NOTICE swss#orchagent: :- delOperation: Vxlan tunnel map entry 'map_1' for tunnel 'neighbor_advertiser' was removed
May 13 13:04:45.763162 str-dx010-acs-4 NOTICE swss#orchagent: :- delOperation: Vxlan tunnel 'neighbor_advertiser' was removed

May 13 13:04:48.765423 str-dx010-acs-4 INFO systemd[1]: warmboot-finalizer.service: Succeeded.
May 13 13:05:53.799740 str-dx010-acs-4 NOTICE swss#fdbsyncd: :- main: VXLAN FDB VNI Reconcillation Complete

Describe the results you expected:

Output of show version:

            "SONiC Software Version: SONiC.HEAD.741-129f803e", 
            "Distribution: Debian 10.9", 
            "Kernel: 4.19.0-12-2-amd64", 
            "Build commit: 129f803e", 
            "Build date: Wed May 12 20:34:08 UTC 2021", 
            "Built by: johnar@jenkins-worker-22", 
            "", 
            "Platform: x86_64-cel_seastone-r0", 
            "HwSKU: Celestica-DX010-C32", 
            "ASIC: broadcom", 
            "ASIC Count: 1", 

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

syslog.328.gz

dgsudharsan commented 2 years ago

This should be fixed with https://github.com/sonic-net/sonic-utilities/pull/2398. @vaibhavhd Can you please verify?