sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
741 stars 1.43k forks source link

[mlnx][spc1] MP2MP IPinIP decap term creation failed with SAI src ip attribute not support #20361

Closed lolyu closed 10 hours ago

lolyu commented 2 months ago

Description

Steps to reproduce the issue:

  1. Install sonic with master image
  2. enable subnet decap and config save
    admin@lab-2700-2:~$ redis-cli -n 4
    127.0.0.1:6379[4]> hgetall "SUBNET_DECAP|subnet_type"
    1) "src_ip"
    2) "20.20.20.0/24"
    3) "src_ip_v6"
    4) "fc01::0/120"
    5) "status"
    6) "enable"
  3. config reload

Describe the results you received:

swss exits due to the SAI MP2MP decap term creation failure that SAI_TUNNEL_TERM_TABLE_ENTRY_ATTR_SRC_IP_MASK is not supported by SAI.

2024 Sep 27 01:07:53.297534 bjw-can-2700-2 INFO swss#orchagent: :- processUnhandledDecapTunnelTerms: Processing unhandled decap tunnel terms for tunnel IPINIP_V6_TUNNEL
2024 Sep 27 01:07:53.297840 bjw-can-2700-2 WARNING swss#orchagent: :- meta_generic_validation_create: SAI_TUNNEL_TERM_TABLE_ENTRY_ATTR_DST_IP:SAI_ATTR_VALUE_TYPE_IP_ADDRESS conditional, but condition was not met, this attribute is not required, but passed (relaxed condition)
2024 Sep 27 01:07:53.298092 bjw-can-2700-2 WARNING swss#orchagent: :- meta_generic_validation_create: SAI_TUNNEL_TERM_TABLE_ENTRY_ATTR_SRC_IP:SAI_ATTR_VALUE_TYPE_IP_ADDRESS conditional, but condition was not met, this attribute is not required, but passed (relaxed condition)
2024 Sep 27 01:07:53.299832 bjw-can-2700-2 WARNING syncd#SDK: [SAI_UTILS.WARNING] ./src/mlnx_sai_utils.c[1781]- check_attribs_metadata: Not implemented attribute SAI_TUNNEL_TERM_TABLE_ENTRY_ATTR_SRC_IP_MASK (vendor data not found)
2024 Sep 27 01:07:53.300032 bjw-can-2700-2 ERR syncd#SDK: [SAI_UTILS.ERR] ./src/mlnx_sai_utils.c[1680]- check_attribs_on_create_without_oid: Failed attributes check
2024 Sep 27 01:07:53.300787 bjw-can-2700-2 WARNING swss#orchagent: :- sai_deserialize_enum: enum -196602 not found in enum sai_status_t
2024 Sep 27 01:07:53.301108 bjw-can-2700-2 WARNING swss#orchagent: :- sai_serialize_enum: enum value -196602 not found in enum sai_status_t
2024 Sep 27 01:07:53.301515 bjw-can-2700-2 WARNING swss#orchagent: :- sai_serialize_enum: enum value -196602 not found in enum sai_status_t
2024 Sep 27 01:07:53.301572 bjw-can-2700-2 ERR swss#orchagent: :- create: create status: -196602
2024 Sep 27 01:07:53.301829 bjw-can-2700-2 ERR swss#orchagent: :- addDecapTunnelTermEntry: Failed to create tunnel decap term entry 192.168.0.0/21.
2024 Sep 27 01:07:53.302046 bjw-can-2700-2 WARNING swss#orchagent: :- sai_serialize_enum: enum value -196602 not found in enum sai_status_t
2024 Sep 27 01:07:53.302257 bjw-can-2700-2 ERR swss#orchagent: :- handleSaiCreateStatus: Encountered failure in create operation, exiting orchagent, SAI API: SAI_API_TUNNEL, status: -196602
2024 Sep 27 01:07:53.302467 bjw-can-2700-2 NOTICE swss#orchagent: :- notifySyncd: sending syncd: SYNCD_INVOKE_DUMP
2024 Sep 27 01:07:53.302885 bjw-can-2700-2 ERR syncd#SDK: [SAI_TUNNEL.ERR] ./src/mlnx_sai_tunnel.c[8585]- mlnx_create_tunnel_term_table_entry: Failed to get sai tunnel term table entry attribute on create
2024 Sep 27 01:07:53.303026 bjw-can-2700-2 WARNING syncd#SDK: :- sai_serialize_enum: enum value -196602 not found in enum sai_status_t
2024 Sep 27 01:07:53.303336 bjw-can-2700-2 ERR syncd#SDK: :- sendApiResponse: api SAI_COMMON_API_CREATE failed in syncd mode: -196602
2024 Sep 27 01:07:53.303464 bjw-can-2700-2 WARNING syncd#SDK: :- sai_serialize_enum: enum value -196602 not found in enum sai_status_t
2024 Sep 27 01:07:53.303635 bjw-can-2700-2 ERR syncd#SDK: :- processQuadEvent: attr: SAI_TUNNEL_TERM_TABLE_ENTRY_ATTR_VR_ID: oid:0x3000000000002
2024 Sep 27 01:07:53.303813 bjw-can-2700-2 ERR syncd#SDK: :- processQuadEvent: attr: SAI_TUNNEL_TERM_TABLE_ENTRY_ATTR_TYPE: SAI_TUNNEL_TERM_TABLE_ENTRY_TYPE_MP2MP
2024 Sep 27 01:07:53.303955 bjw-can-2700-2 ERR syncd#SDK: :- processQuadEvent: attr: SAI_TUNNEL_TERM_TABLE_ENTRY_ATTR_TUNNEL_TYPE: SAI_TUNNEL_TYPE_IPINIP
2024 Sep 27 01:07:53.304100 bjw-can-2700-2 ERR syncd#SDK: :- processQuadEvent: attr: SAI_TUNNEL_TERM_TABLE_ENTRY_ATTR_ACTION_TUNNEL_ID: oid:0x2a0000000005b6
2024 Sep 27 01:07:53.304453 bjw-can-2700-2 ERR syncd#SDK: :- processQuadEvent: attr: SAI_TUNNEL_TERM_TABLE_ENTRY_ATTR_SRC_IP: 20.20.20.0
2024 Sep 27 01:07:53.304638 bjw-can-2700-2 ERR syncd#SDK: :- processQuadEvent: attr: SAI_TUNNEL_TERM_TABLE_ENTRY_ATTR_DST_IP: 192.168.0.0
2024 Sep 27 01:07:53.304762 bjw-can-2700-2 ERR syncd#SDK: :- processQuadEvent: attr: SAI_TUNNEL_TERM_TABLE_ENTRY_ATTR_SRC_IP_MASK: 255.255.255.0
2024 Sep 27 01:07:53.304877 bjw-can-2700-2 ERR syncd#SDK: :- processQuadEvent: attr: SAI_TUNNEL_TERM_TABLE_ENTRY_ATTR_DST_IP_MASK: 255.255.248.0
2024 Sep 27 01:07:53.305039 bjw-can-2700-2 NOTICE syncd#SDK: :- processNotifySyncd: Invoking SAI failure dump

Describe the results you expected:

MP2MP decap term should be created successfully on spc1 device.

Output of show version:

admin@bjw-can-2700-2:~$ show version

SONiC Software Version: SONiC.internal.104189370-982200074d
SONiC OS Version: 12
Distribution: Debian 12.6
Kernel: 6.1.0-22-2-amd64
Build commit: 982200074d
Build date: Thu Sep 26 08:04:48 UTC 2024
Built by: azureuser@9fe29d63c000000

Platform: x86_64-mlnx_msn2700-r0
HwSKU: Mellanox-SN2700
ASIC: mellanox
ASIC Count: 1
Serial Number: Undefined.
Model Number: Undefined.
Hardware Revision: B4
Uptime: 01:28:51 up 33 min,  2 users,  load average: 5.05, 6.84, 6.50
Date: Fri 27 Sep 2024 01:28:51

Sai version:

root@bjw-can-2700-2:/# dpkg -l | grep sai
ii  libsaimetadata                     1.0.0                          amd64        This package contains SAI-Metadata implementation for SONiC project.
ii  libsairedis                        1.0.0                          amd64        This package contains SAI-Redis implementation for SONiC project.
ii  mlnx-sai                           1.mlnx.SAIBuild2405.28.0.33    amd64        contains SAI implementation for Mellanox hardware

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

judyjoseph commented 1 month ago

@bingwang-ms f.y.i need SAI support

lolyu commented 1 month ago

The mp2mp decap term creation is also failing on 202405:

2024 Oct 11 14:25:26.131531 bjw-can-2700-3 WARNING swss#orchagent: :- meta_generic_validation_create: SAI_TUNNEL_TERM_TABLE_ENTRY_ATTR_DST_IP:SAI_ATTR_VALUE_TYPE_IP_ADDRESS conditional, but condition was not met, this attribute is not required, but passed (relaxed condition)
2024 Oct 11 14:25:26.132140 bjw-can-2700-3 WARNING swss#orchagent: :- meta_generic_validation_create: SAI_TUNNEL_TERM_TABLE_ENTRY_ATTR_SRC_IP:SAI_ATTR_VALUE_TYPE_IP_ADDRESS conditional, but condition was not met, this attribute is not required, but passed (relaxed condition)
2024 Oct 11 14:25:26.134291 bjw-can-2700-3 WARNING syncd#SDK: [SAI_UTILS.WARNING] ./src/mlnx_sai_utils.c[1781]- check_attribs_metadata: Not implemented attribute SAI_TUNNEL_TERM_TABLE_ENTRY_ATTR_SRC_IP_MASK (vendor data not found)
2024 Oct 11 14:25:26.134813 bjw-can-2700-3 ERR syncd#SDK: [SAI_UTILS.ERR] ./src/mlnx_sai_utils.c[1680]- check_attribs_on_create_without_oid: Failed attributes check
2024 Oct 11 14:25:26.135194 bjw-can-2700-3 ERR syncd#SDK: [SAI_TUNNEL.ERR] ./src/mlnx_sai_tunnel.c[8585]- mlnx_create_tunnel_term_table_entry: Failed to get sai tunnel term table entry attribute on create
2024 Oct 11 14:25:26.135509 bjw-can-2700-3 WARNING syncd#SDK: :- sai_serialize_enum: enum value -196602 not found in enum sai_status_t
2024 Oct 11 14:25:26.135876 bjw-can-2700-3 ERR syncd#SDK: :- sendApiResponse: api SAI_COMMON_API_CREATE failed in syncd mode: -196602
2024 Oct 11 14:25:26.136223 bjw-can-2700-3 WARNING syncd#SDK: :- sai_serialize_enum: enum value -196602 not found in enum sai_status_t
2024 Oct 11 14:25:26.137097 bjw-can-2700-3 ERR syncd#SDK: :- processQuadEvent: attr: SAI_TUNNEL_TERM_TABLE_ENTRY_ATTR_VR_ID: oid:0x3000000000002
2024 Oct 11 14:25:26.137445 bjw-can-2700-3 ERR syncd#SDK: :- processQuadEvent: attr: SAI_TUNNEL_TERM_TABLE_ENTRY_ATTR_TYPE: SAI_TUNNEL_TERM_TABLE_ENTRY_TYPE_MP2MP
2024 Oct 11 14:25:26.137722 bjw-can-2700-3 ERR syncd#SDK: :- processQuadEvent: attr: SAI_TUNNEL_TERM_TABLE_ENTRY_ATTR_TUNNEL_TYPE: SAI_TUNNEL_TYPE_IPINIP
2024 Oct 11 14:25:26.137980 bjw-can-2700-3 ERR syncd#SDK: :- processQuadEvent: attr: SAI_TUNNEL_TERM_TABLE_ENTRY_ATTR_ACTION_TUNNEL_ID: oid:0x2a0000000005a7
2024 Oct 11 14:25:26.138336 bjw-can-2700-3 ERR syncd#SDK: :- processQuadEvent: attr: SAI_TUNNEL_TERM_TABLE_ENTRY_ATTR_SRC_IP: 20.20.20.0
2024 Oct 11 14:25:26.138668 bjw-can-2700-3 ERR syncd#SDK: :- processQuadEvent: attr: SAI_TUNNEL_TERM_TABLE_ENTRY_ATTR_DST_IP: 192.168.0.1
2024 Oct 11 14:25:26.138974 bjw-can-2700-3 ERR syncd#SDK: :- processQuadEvent: attr: SAI_TUNNEL_TERM_TABLE_ENTRY_ATTR_SRC_IP_MASK: 255.255.255.0
2024 Oct 11 14:25:26.139320 bjw-can-2700-3 ERR syncd#SDK: :- processQuadEvent: attr: SAI_TUNNEL_TERM_TABLE_ENTRY_ATTR_DST_IP_MASK: 255.255.248.0
2024 Oct 11 14:25:26.140886 bjw-can-2700-3 WARNING swss#orchagent: :- sai_deserialize_enum: enum -196602 not found in enum sai_status_t
2024 Oct 11 14:25:26.141172 bjw-can-2700-3 WARNING swss#orchagent: :- sai_serialize_enum: enum value -196602 not found in enum sai_status_t
2024 Oct 11 14:25:26.141385 bjw-can-2700-3 WARNING swss#orchagent: :- sai_serialize_enum: enum value -196602 not found in enum sai_status_t
2024 Oct 11 14:25:26.141446 bjw-can-2700-3 ERR swss#orchagent: :- create: create status: -196602
2024 Oct 11 14:25:26.141696 bjw-can-2700-3 ERR swss#orchagent: :- addDecapTunnelTermEntry: Failed to create tunnel decap term entry 192.168.0.1/21.
2024 Oct 11 14:25:26.141906 bjw-can-2700-3 WARNING swss#orchagent: :- sai_serialize_enum: enum value -196602 not found in enum sai_status_t
2024 Oct 11 14:25:26.142091 bjw-can-2700-3 ERR swss#orchagent: :- handleSaiCreateStatus: Encountered failure in create operation, exiting orchagent, SAI API: SAI_API_TUNNEL, status: -196602

Device info:

$ show version

SONiC Software Version: SONiC.20240531.05
SONiC OS Version: 12
Distribution: Debian 12.6
Kernel: 6.1.0-11-2-amd64
Build commit: c769355b59
Build date: Fri Sep 27 16:57:29 UTC 2024
Built by: azureuser@68a243adc000000

Platform: x86_64-mlnx_msn2700-r0
HwSKU: Mellanox-SN2700
ASIC: mellanox
ASIC Count: 1
Serial Number: MT2302J00192
Model Number: MSN2700-CS2ROS
Hardware Revision: BF
Uptime: 14:19:50 up 16:32,  1 user,  load average: 2.95, 2.30, 1.67
Date: Fri 11 Oct 2024 14:19:50
# dpkg -l | grep sai
ii  libsaimetadata                     1.0.0                          amd64        This package contains SAI-Metadata implementation for SONiC project.
ii  libsairedis                        1.0.0                          amd64        This package contains SAI-Redis implementation for SONiC project.
ii  mlnx-sai                           1.mlnx.SAIBuild2405.28.0.33    amd64        contains SAI implementation for Mellanox hardware
ayurkiv-nvda commented 3 weeks ago

Hello @lolyu

can you please elaborate more about 2nd step?

How config_db should look like and how to enable subnet decap properly?

lolyu commented 3 weeks ago

Hi @ayurkiv-nvda, create the follow table entry in config_db on t0:

admin@lab-2700-2:~$ redis-cli -n 4
127.0.0.1:6379[4]> hgetall "SUBNET_DECAP|vlan"
1) "src_ip"
2) "20.20.20.0/24"
3) "src_ip_v6"
4) "fc01::0/120"
5) "status"
6) "enable"

And after config reload, SONIC will try to create a new tunnel for it and MP2MP decap terms for VLAN prefixes (src ip as 20.20.20.0/24).

ayurkiv-nvda commented 1 day ago

Tested on SPC-3 also successfully reproduced on 202405.704663-29cbc5423 with SAI 2405.28.0.33 Tested later on internal 202405_RC image with SAI 2405.30.0.0 - bug not reproduced

Need update SAI to newer version, it should fix the problem

Suggest to close this bug

lolyu commented 17 hours ago

Tested on SPC-3 also successfully reproduced on 202405.704663-29cbc5423 with SAI 2405.28.0.33 Tested later on internal 202405_RC image with SAI 2405.30.0.0 - bug not reproduced

Need update SAI to newer version, it should fix the problem

Suggest to close this bug

Hi @ayurkiv-nvda, should 202405 image include the newer version SAI?

ayurkiv-nvda commented 10 hours ago

if no side effects are expected, then it could be a solution

Tested on SPC-3 also successfully reproduced on 202405.704663-29cbc5423 with SAI 2405.28.0.33 Tested later on internal 202405_RC image with SAI 2405.30.0.0 - bug not reproduced Need update SAI to newer version, it should fix the problem Suggest to close this bug

Hi @ayurkiv-nvda, should 202405 image include the newer version SAI?

if no side effects are expected, then it could be a solution

lolyu commented 8 hours ago

Hi @ayurkiv-nvda, could you please provide when will 202405 include the new version SAI? This should be a bug for 202405.