sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
739 stars 1.43k forks source link

Missing HASH information for LAG and ECMP #19218

Open mbze430 opened 5 months ago

mbze430 commented 5 months ago

Description

Maybe not be a bug, but I don't see any HASH information for ECMP or LAG

Steps to reproduce the issue:

I am running 202311 and I am starting to see that all my LAG are not using the links equally.

admin@sonic:~$ show switch-hash global No configuration is present in CONFIG DB

admin@sonic:~$ show switch-hash capabilities

Describe the results you received:

+--------+---------------------------------+ | Hash | Capabilities | +========+=================================+ | ECMP | +---------------+-------------+ | | | | Hash Field | Algorithm | | | | |---------------+-------------| | | | | not supported | N/A | | | | +---------------+-------------+ | +--------+---------------------------------+ | LAG | +---------------+-------------+ | | | | Hash Field | Algorithm | | | | |---------------+-------------| | | | | not supported | N/A | | | | +---------------+-------------+ | +--------+---------------------------------+

How do I create the proper HASH for the load balance for LAG and ECMP?

Describe the results you expected:

Give information on the HASH and algorithm

Output of show version:

SONiC Software Version: SONiC.202311.556513-ee79638a6 SONiC OS Version: 11 Distribution: Debian 11.9 Kernel: 5.10.0-23-2-amd64 Build commit: ee79638a6 Build date: Sun May 26 12:18:55 UTC 2024 Built by: AzDevOps@vmss-soni003RMR

Platform: x86_64-cel_seastone-r0 HwSKU: Seastone-DX010 ASIC: broadcom ASIC Count: 1 Serial Number: DX010B2F108423LK100045 Model Number: R0872-F0010-01 Hardware Revision: N/A Uptime: 19:51:23 up 4 days, 10:16, 1 user, load average: 2.24, 1.82, 1.40 Date: Wed 05 Jun 2024 19:51:23

Docker images: REPOSITORY TAG IMAGE ID SIZE docker-gbsyncd-broncos 202311.556513-ee79638a6 f51e94348adf 352MB docker-gbsyncd-broncos latest f51e94348adf 352MB docker-gbsyncd-credo 202311.556513-ee79638a6 08393d9ed5d8 324MB docker-gbsyncd-credo latest 08393d9ed5d8 324MB docker-syncd-brcm 202311.556513-ee79638a6 7be2803ba0a1 715MB docker-syncd-brcm latest 7be2803ba0a1 715MB docker-dhcp-relay latest aa75b5e24f4d 310MB docker-macsec latest 3286479f7e8c 329MB docker-orchagent 202311.556513-ee79638a6 07f025d9fc35 339MB docker-orchagent latest 07f025d9fc35 339MB docker-eventd 202311.556513-ee79638a6 20e72910700e 301MB docker-eventd latest 20e72910700e 301MB docker-fpm-frr 202311.556513-ee79638a6 6100d7ecaa3c 359MB docker-fpm-frr latest 6100d7ecaa3c 359MB docker-nat 202311.556513-ee79638a6 dcf490885632 330MB docker-nat latest dcf490885632 330MB docker-sflow 202311.556513-ee79638a6 4107d97847c8 329MB docker-sflow latest 4107d97847c8 329MB docker-teamd 202311.556513-ee79638a6 8c8d8a5fbc64 327MB docker-teamd latest 8c8d8a5fbc64 327MB docker-platform-monitor 202311.556513-ee79638a6 647c008137e0 421MB docker-platform-monitor latest 647c008137e0 421MB docker-snmp 202311.556513-ee79638a6 5749ba7cc4b3 340MB docker-snmp latest 5749ba7cc4b3 340MB docker-router-advertiser 202311.556513-ee79638a6 39a9613b59ad 301MB docker-router-advertiser latest 39a9613b59ad 301MB docker-lldp 202311.556513-ee79638a6 b7e52659ba2e 343MB docker-lldp latest b7e52659ba2e 343MB docker-mux 202311.556513-ee79638a6 01698347388e 349MB docker-mux latest 01698347388e 349MB docker-sonic-gnmi 202311.556513-ee79638a6 fd091968243d 389MB docker-sonic-gnmi latest fd091968243d 389MB docker-database 202311.556513-ee79638a6 4d9a0e110d21 301MB docker-database latest 4d9a0e110d21 301MB docker-sonic-mgmt-framework 202311.556513-ee79638a6 8419eaeab503 431MB docker-sonic-mgmt-framework latest 8419eaeab503 431MB

prabhataravind commented 4 months ago

The show command not showing anything is expected unless there is an explicit configuration. What is the traffic pattern you are sending? Please also check the ECMP and LAG offsets in switch object in APP_DB.

mbze430 commented 4 months ago

I don't know how to look at the APP_DB??? not even sure what that means.

prabhataravind commented 4 months ago

@mbze430, please refer https://github.com/sonic-net/SONiC/wiki/Architecture

APPL_DB: Stores the state generated by all application containers -- routes, next-hops, neighbors, etc. This is the south-bound entry point for all applications wishing to interact with other SONiC subsystems.

You can dump ECMP and LAG hash seeds and offsets (if any) using the following command:

$ redis-cli -n 0 hgetall "SWITCH_TABLE:switch" 1) "ecmp_hash_seed" 2) "10" 3) "fdb_aging_time" 4) "600" 5) "lag_hash_seed" 6) "10" 7) "ordered_ecmp" 8) "true"

These attributes along with the 5-tuple (src IP, dst IP, src port, dst port, protocol) for the packet itself should give a reasonably good ECMP/LAG load-balancing as long as there is enough entropy in the 5-tuple for the packets you send. You might want to randomize/change some of these attributes to measure how good the hashing is.

redis instance 0 is APP_DB.

nazariig commented 4 months ago

@mbze430 did you have a chance to read Generic Hash HLD?

Now regarding the question:

  1. Hash capabilities

Show hash capabilities CLI output tells us that your system does not support hash fields configuration: the message not supported means that SAI does not expose such a capability - please refer to the vendor's SAI documentation. Status N/A for Hashing Algorithm means that SAI does not expose the list of available hashing algorithms, but still allows user to configure it - the errors, if any, shall be handled during SAI call.

  1. Hash configuration

The message No configuration is present in CONFIG DB means that Config DB does not contain any hash related configuration, so you won't be able to see it. Use CLI or JSON notation to add a configuration.

In general, i see that your SAI does not support this feature. In this case the likelihood that you won't be able to configure the desired behavior is very high.

mbze430 commented 4 months ago

@qnos I think you are the maintainer for Celestica? Anyway can you verify the availability of the missing HASH information?

vknyazhev commented 1 month ago

Good afternoon!How can I set fields for balancing in LAG?Maybe this can be done through SAI commands? ~$ sudo config aaa dhcp-snoop kdump ndp qos startupconfiguration vlan-stacking acl dropcounters kubernetes neigh-suppress queue storm-control vnet arp ecn line ntp radius subinterfaces vrf auto-techsupport environment lldp obj-track reboot-cause syslog vrrp auto-techsupport-feature feature logging pbh route-map system vxlan bfd fgnhg mac pbr runningconfiguration system-health warm_restart boot flowcnt-trap management_interface pfc sag system-memory watermark buffer headroom-pool mclag pfcwd scheduler tacacs wred buffer_pool history mgmt-vrf platform services techsupport ztp chassis igmp mirror_session policer sflow uptime
clock interfaces muxcable priority-group snmpagentaddress users
dhcp6relay_counters ip nac processes snmptrap version
dhcprelay_helper ipv6 nat protocol-down spanning-tree vlan

nazariig commented 1 month ago

Good afternoon!How can I set fields for balancing in LAG?Maybe this can be done through SAI commands? ~$ sudo config aaa dhcp-snoop kdump ndp qos startupconfiguration vlan-stacking acl dropcounters kubernetes neigh-suppress queue storm-control vnet arp ecn line ntp radius subinterfaces vrf auto-techsupport environment lldp obj-track reboot-cause syslog vrrp auto-techsupport-feature feature logging pbh route-map system vxlan bfd fgnhg mac pbr runningconfiguration system-health warm_restart boot flowcnt-trap management_interface pfc sag system-memory watermark buffer headroom-pool mclag pfcwd scheduler tacacs wred buffer_pool history mgmt-vrf platform services techsupport ztp chassis igmp mirror_session policer sflow uptime clock interfaces muxcable priority-group snmpagentaddress users dhcp6relay_counters ip nac processes snmptrap version dhcprelay_helper ipv6 nat protocol-down spanning-tree vlan

Via Config DB:

{
    "SWITCH_HASH": {
        "GLOBAL": {
            "lag_hash": [
                "DST_MAC",
                "SRC_MAC",
                "ETHERTYPE",
                "IP_PROTOCOL",
                "DST_IP",
                "SRC_IP",
                "L4_DST_PORT",
                "L4_SRC_PORT",
                "INNER_DST_MAC",
                "INNER_SRC_MAC",
                "INNER_ETHERTYPE",
                "INNER_IP_PROTOCOL",
                "INNER_DST_IP",
                "INNER_SRC_IP",
                "INNER_L4_DST_PORT",
                "INNER_L4_SRC_PORT"
            ]
        }
    }
}

Via CLI:

config switch-hash global lag-hash \
'DST_MAC' \
'SRC_MAC' \
'ETHERTYPE' \
'IP_PROTOCOL' \
'DST_IP' \
'SRC_IP' \
'L4_DST_PORT' \
'L4_SRC_PORT' \
'INNER_DST_MAC' \
'INNER_SRC_MAC' \
'INNER_ETHERTYPE' \
'INNER_IP_PROTOCOL' \
'INNER_DST_IP' \
'INNER_SRC_IP' \
'INNER_L4_DST_PORT' \
'INNER_L4_SRC_PORT'
vknyazhev commented 1 month ago

Hi! I don't have the switch-hash command to set hash fields. What should I do? ~$ sudo config s sag scheduler snmp snmptrap static-mac synchronous_mode
save sflow snmpagentaddress spanning-tree subinterface syslog

vknyazhev commented 1 month ago

I found in the manual a command to set the algorithm via SAI (redis-cli -n 4 hset "SAI_METADATA|sai_profile "SAI_LAG_HASH_ALGO" "crc_32_eth_lo"), maybe you can also set the hash field?

nazariig commented 1 month ago

Hi! I don't have the switch-hash command to set hash fields. What should I do? ~$ sudo config s sag scheduler snmp snmptrap static-mac synchronous_mode save sflow snmpagentaddress spanning-tree subinterface syslog

@vknyazhev if no such CLI command in the main tree, very likely the image you are using does not support Generic Hash feature. Which image are u using (e.g., 202012/202305 etc.) BTW?

nazariig commented 1 month ago

I found in the manual a command to set the algorithm via SAI (redis-cli -n 4 hset "SAI_METADATA|sai_profile "SAI_LAG_HASH_ALGO" "crc_32_eth_lo"), maybe you can also set the hash field?

What u are trying to do is usually for advanced users/debugging purposes only

https://github.com/sonic-net/sonic-swss-common/blob/master/common/schema.h#L8

/***** DATABASE *****/

#define APPL_DB         0
#define ASIC_DB         1
#define COUNTERS_DB     2
#define LOGLEVEL_DB     3
#define CONFIG_DB       4
#define PFC_WD_DB       5
#define FLEX_COUNTER_DB 5
#define STATE_DB        6
#define SNMP_OVERLAY_DB 7
#define RESTAPI_DB      8
#define GB_ASIC_DB      9
#define GB_COUNTERS_DB  10
#define GB_FLEX_COUNTER_DB  11
#define CHASSIS_APP_DB      12
#define CHASSIS_STATE_DB    13
#define APPL_STATE_DB       14
#define DPU_APPL_DB         15
#define DPU_APPL_STATE_DB   16
#define DPU_STATE_DB        17
#define DPU_COUNTERS_DB     18
#define EVENT_DB            19
#define BMP_STATE_DB        20

Config DB does not have such a key SAI_METADATA|sai_profile|SAI_LAG_HASH_ALGO. Please pay attention to what you write directly to DB