opencomputeproject / SAI-Challenger

Other
16 stars 34 forks source link

SAI LAG API calls are taking more time in SONiC 202205 with lpush and publish method #81

Closed sj-harrish closed 1 year ago

sj-harrish commented 1 year ago

Hi SAI-Challenger Team,

We are observing that SAI LAG API calls are taking more than 3+ minutes to respond to the request published via ASIC_STATE_KEY_VALUE_OP_QUEUE lpush and publish method. The issue is not seen in SONiC-202111 Community but observed from SONiC-202205 branch. Could you please help us in identifying the issue here.? Thanks in advance,

Version Details:

SONiC-202205. Hash - 9a3571cec HW - NVIDIA - SN4600

Detailed Steps used to Recreate:

OID for LAG is obtained using below steps:
===========================================

redis-cli -n 1
>>incr VIDCOUNTER
>>get VIDCOUNTER
2976

vid=2976
lag_sai_type=2
oid="oid:" + hex((lag_sai_type << 48) | vid)

Steps to push the LAG Creation to ASIC DB:
--------------------------------------------
lpush ASIC_STATE_KEY_VALUE_OP_QUEUE SAI_OBJECT_TYPE_LAG:oid:0x2000000000ba0 "[]" Screate
publish ASIC_STATE_KEY_VALUE_OP_QUEUE G

Logs from Device:
------------------
admin@sonic:~$
admin@sonic:~$
Apr 06 09:49:18.188474 System is ready
docker exec -it syncd python3 /usr/bin/sx_api_lag_dump.py

DISCLAIMER
==========
The following code is an EXAMPLE whose sole purpose is to demonstrate various capabilities of the SDK.
This example represents a common configuration scenario and is in no way a complete guide to the demonstrated feature.
For complete documentation please refer to the SDK API Guide.
There is no guarantee that this example will run successfully on all hardware and software configurations.

No LAG found!
admin@sonic:~$
admin@sonic:~$ docker ps -a
CONTAINER ID   IMAGE                                COMMAND                  CREATED          STATUS                     PORTS     NAMES
dd9565766af5   docker-sonic-telemetry:latest        "/usr/local/bin/supe…"   10 seconds ago   Exited (0) 4 seconds ago             telemetry
3d3878b68d52   docker-sonic-mgmt-framework:latest   "/usr/local/bin/supe…"   10 seconds ago   Up 10 seconds                        mgmt-framework
5cb78265e91e   docker-snmp:latest                   "/usr/local/bin/supe…"   10 seconds ago   Up 8 seconds                         snmp
3d4ba3f457a2   docker-lldp:latest                   "/usr/bin/docker-lld…"   2 minutes ago    Up 2 minutes                         lldp
569e3b01f724   docker-router-advertiser:latest      "/usr/bin/docker-ini…"   2 minutes ago    Up 2 minutes                         radv
9831c3b8d645   docker-platform-monitor:latest       "/usr/bin/docker_ini…"   2 minutes ago    Up 2 minutes                         pmon
cf7f95ef40a1   docker-syncd-mlnx:latest             "/usr/local/bin/supe…"   2 minutes ago    Up 2 minutes                         syncd
f1ce92bd9edc   docker-teamd:latest                  "/usr/local/bin/supe…"   3 minutes ago    Up 3 minutes                         teamd
7cfda94cfd5c   docker-fpm-frr:latest                "/usr/bin/docker_ini…"   3 minutes ago    Up 3 minutes                         bgp
5b39bc10bbc5   docker-orchagent:latest              "/usr/bin/docker-ini…"   3 minutes ago    Up 3 minutes                         swss
f660e21c2de2   docker-database:latest               "/usr/local/bin/dock…"   3 minutes ago    Up 3 minutes                         database
admin@sonic:~$
admin@sonic:~$
admin@sonic:~$ docker exec -it syncd python3 /usr/bin/sx_api_lag_dump.py
DISCLAIMER
==========
The following code is an EXAMPLE whose sole purpose is to demonstrate various capabilities of the SDK.
This example represents a common configuration scenario and is in no way a complete guide to the demonstrated feature.
For complete documentation please refer to the SDK API Guide.
There is no guarantee that this example will run successfully on all hardware and software configurations.

127.0.0.1:6379[1]> lpush ASIC_STATE_KEY_VALUE_OP_QUEUE SAI_OBJECT_TYPE_LAG:oid:0x2000000000ba0 "[]" Screate
(integer) 3
127.0.0.1:6379[1]> publish ASIC_STATE_KEY_VALUE_OP_QUEUE G
(integer) 0
127.0.0.1:6379[1]>
admin@sonic:~$ docker exec -it syncd python3 /usr/bin/sx_api_lag_dump.py
DISCLAIMER
==========
The following code is an EXAMPLE whose sole purpose is to demonstrate various capabilities of the SDK.
This example represents a common configuration scenario and is in no way a complete guide to the demonstrated feature.
For complete documentation please refer to the SDK API Guide.
There is no guarantee that this example will run successfully on all hardware and software configurations.

No LAG found!
admin@sonic:~$ sudo tail -f /var/log/syslog
Apr  6 09:51:56.772218 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:51:57.033420 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:51:57.329191 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:51:57.563168 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:51:57.858774 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:51:58.158369 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:51:58.460573 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:51:58.693469 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:51:58.949513 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:52:05.318297 sonic ERR systemd[1]: Failed to start Telemetry container.
^C
admin@sonic:~$ date
Thu 06 Apr 2023 09:52:12 AM UTC
admin@sonic:~$ sudo tail -f /var/log/syslog
Apr  6 09:51:56.772218 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:51:57.033420 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:51:57.329191 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:51:57.563168 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:51:57.858774 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:51:58.158369 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:51:58.460573 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:51:58.693469 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:51:58.949513 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:52:05.318297 sonic ERR systemd[1]: Failed to start Telemetry container.

Apr  6 09:52:28.425804 sonic INFO dhclient[4602]: XMT: Solicit on eth0, interval 125920ms.
Apr  6 09:52:29.157523 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:52:32.245293 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:52:32.246127 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:52:32.247860 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:52:32.329847 sonic INFO memory_checker: [memory_checker] Exits without checking memory usage since container 'telemetry' is not running!
Apr  6 09:52:58.962488 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:52:58.963390 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:52:58.965360 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:52:59.101963 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:52:59.392920 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:52:59.691865 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:52:59.966173 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:53:00.276489 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:53:00.557437 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:53:00.844079 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:53:01.177207 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:53:01.406102 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:53:01.434732 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:53:01.666707 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:53:01.919296 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:53:29.158015 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:53:32.270360 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:53:32.272684 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:53:32.277497 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:53:32.323595 sonic INFO memory_checker: [memory_checker] Exits without checking memory usage since container 'telemetry' is not running!
Apr  6 09:54:01.932416 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:54:01.933209 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:54:01.934926 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:54:02.081954 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:54:02.370357 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:54:02.647230 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:54:02.947194 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:54:03.236092 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:54:03.508888 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:54:03.781043 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:54:04.041217 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:54:04.335905 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:54:04.582549 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:54:04.807248 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:54:29.157853 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:54:32.295922 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:54:32.296630 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:54:32.298571 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:54:32.344960 sonic INFO memory_checker: [memory_checker] Exits without checking memory usage since container 'telemetry' is not running!

Apr  6 09:54:34.346276 sonic INFO dhclient[4602]: XMT: Solicit on eth0, interval 114440ms.
Apr  6 09:54:38.405321 sonic INFO dhclient[4574]: DHCPREQUEST for 10.4.5.149 on eth0 to 10.4.4.11 port 67
Apr  6 09:54:38.428602 sonic INFO dhclient[4574]: DHCPACK of 10.4.5.149 from 10.4.4.11
Apr  6 09:54:38.457743 sonic INFO dhclient[4574]: bound to 10.4.5.149 -- renewal in 252 seconds.
Apr  6 09:55:04.823442 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:55:04.826755 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:55:04.966219 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:55:05.272566 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:55:05.556382 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:55:05.861042 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:55:06.146686 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:55:06.455721 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:55:06.746329 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:55:07.021551 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:55:07.313796 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:55:07.571522 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:55:07.793654 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:55:29.161135 sonic INFO systemd[1]: Starting Discard unused blocks...
Apr  6 09:55:29.162094 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:55:29.164403 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:55:29.221970 sonic NOTICE root: smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-18-2-amd64] (local build)
Apr  6 09:55:29.222185 sonic NOTICE root: Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
Apr  6 09:55:29.222330 sonic NOTICE root:
Apr  6 09:55:29.222468 sonic NOTICE root: === START OF INFORMATION SECTION ===
Apr  6 09:55:29.222677 sonic NOTICE root: Device Model:     StorFly VSFBM4XC030G-MLX1
Apr  6 09:55:29.222805 sonic NOTICE root: Serial Number:    53529-0568
Apr  6 09:55:29.222938 sonic NOTICE root: Firmware Version: 0202-000
Apr  6 09:55:29.223058 sonic NOTICE root: User Capacity:    30,016,659,456 bytes [30.0 GB]
Apr  6 09:55:29.223178 sonic NOTICE root: Sector Size:      512 bytes logical/physical
Apr  6 09:55:29.223298 sonic NOTICE root: Rotation Rate:    Solid State Device
Apr  6 09:55:29.223416 sonic NOTICE root: TRIM Command:     Available, deterministic, zeroed
Apr  6 09:55:29.223535 sonic NOTICE root: Device is:        Not in smartctl database [for details use: -P showall]
Apr  6 09:55:29.223654 sonic NOTICE root: ATA Version is:   ACS-2 (minor revision not indicated)
Apr  6 09:55:29.223780 sonic NOTICE root: SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Apr  6 09:55:29.223929 sonic NOTICE root: Local Time is:    Thu Apr  6 09:55:29 2023 UTC
Apr  6 09:55:29.224045 sonic NOTICE root: SMART support is: Available - device has SMART capability.
Apr  6 09:55:29.224163 sonic NOTICE root: SMART support is: Enabled
Apr  6 09:55:29.224285 sonic NOTICE root:
Apr  6 09:55:29.224495 sonic NOTICE root: === START OF READ SMART DATA SECTION ===
Apr  6 09:55:29.224983 sonic NOTICE root: SMART overall-health self-assessment test result: PASSED
Apr  6 09:55:29.225095 sonic NOTICE root:
Apr  6 09:55:29.225168 sonic NOTICE root: General SMART Values:
Apr  6 09:55:29.225217 sonic NOTICE root: Offline data collection status:  (0x00)#011Offline data collection activity
Apr  6 09:55:29.225257 sonic NOTICE root: #011#011#011#011#011was never started.
Apr  6 09:55:29.225297 sonic NOTICE root: #011#011#011#011#011Auto Offline Data Collection: Disabled.
Apr  6 09:55:29.225337 sonic NOTICE root: Self-test execution status:      (   0)#011The previous self-test routine completed
Apr  6 09:55:29.225380 sonic NOTICE root: #011#011#011#011#011without error or no self-test has ever
Apr  6 09:55:29.225421 sonic NOTICE root: #011#011#011#011#011been run.
Apr  6 09:55:29.225461 sonic NOTICE root: Total time to complete Offline
Apr  6 09:55:29.225501 sonic NOTICE root: data collection: #011#011(    0) seconds.
Apr  6 09:55:29.225540 sonic NOTICE root: Offline data collection
Apr  6 09:55:29.225580 sonic NOTICE root: capabilities: #011#011#011 (0x71) SMART execute Offline immediate.
Apr  6 09:55:29.225620 sonic NOTICE root: #011#011#011#011#011No Auto Offline data collection support.
Apr  6 09:55:29.225664 sonic NOTICE root: #011#011#011#011#011Suspend Offline collection upon new
Apr  6 09:55:29.225703 sonic NOTICE root: #011#011#011#011#011command.
Apr  6 09:55:29.225743 sonic NOTICE root: #011#011#011#011#011No Offline surface scan supported.
Apr  6 09:55:29.225782 sonic NOTICE root: #011#011#011#011#011Self-test supported.
Apr  6 09:55:29.225821 sonic NOTICE root: #011#011#011#011#011Conveyance Self-test supported.
Apr  6 09:55:29.225860 sonic NOTICE root: #011#011#011#011#011Selective Self-test supported.
Apr  6 09:55:29.225899 sonic NOTICE root: SMART capabilities:            (0x0002)#011Does not save SMART data before
Apr  6 09:55:29.225942 sonic NOTICE root: #011#011#011#011#011entering power-saving mode.
Apr  6 09:55:29.225983 sonic NOTICE root: #011#011#011#011#011Supports SMART auto save timer.
Apr  6 09:55:29.226022 sonic NOTICE root: Error logging capability:        (0x01)#011Error logging supported.
Apr  6 09:55:29.226061 sonic NOTICE root: #011#011#011#011#011General Purpose Logging supported.
Apr  6 09:55:29.226101 sonic NOTICE root: Short self-test routine
Apr  6 09:55:29.226144 sonic NOTICE root: recommended polling time: #011 (   1) minutes.
Apr  6 09:55:29.226184 sonic NOTICE root: Extended self-test routine
Apr  6 09:55:29.226224 sonic NOTICE root: recommended polling time: #011 (   1) minutes.
Apr  6 09:55:29.226262 sonic NOTICE root: Conveyance self-test routine
Apr  6 09:55:29.226301 sonic NOTICE root: recommended polling time: #011 (   1) minutes.
Apr  6 09:55:29.226340 sonic NOTICE root:
Apr  6 09:55:29.226381 sonic NOTICE root: SMART Attributes Data Structure revision number: 1
Apr  6 09:55:29.226424 sonic NOTICE root: Vendor Specific SMART Attributes with Thresholds:
Apr  6 09:55:29.226463 sonic NOTICE root: ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
Apr  6 09:55:29.226503 sonic NOTICE root:   1 Raw_Read_Error_Rate     0x0000   100   100   070    Old_age   Offline      -       0
Apr  6 09:55:29.226543 sonic NOTICE root:   5 Reallocated_Sector_Ct   0x0000   100   100   000    Old_age   Offline      -       0
Apr  6 09:55:29.226583 sonic NOTICE root:   9 Power_On_Hours          0x0000   100   100   000    Old_age   Offline      -       535
Apr  6 09:55:29.226623 sonic NOTICE root:  12 Power_Cycle_Count       0x0000   100   100   000    Old_age   Offline      -       2483
Apr  6 09:55:29.226662 sonic NOTICE root: 160 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       0
Apr  6 09:55:29.226705 sonic NOTICE root: 161 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       265
Apr  6 09:55:29.226745 sonic NOTICE root: 163 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       18
Apr  6 09:55:29.226785 sonic NOTICE root: 164 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       630240
Apr  6 09:55:29.226825 sonic NOTICE root: 165 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       319
Apr  6 09:55:29.226872 sonic NOTICE root: 166 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       221
Apr  6 09:55:29.226913 sonic NOTICE root: 167 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       303
Apr  6 09:55:29.226964 sonic NOTICE root: 168 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       20000
Apr  6 09:55:29.227008 sonic NOTICE root: 177 Wear_Leveling_Count     0x0000   100   100   050    Old_age   Offline      -       48
Apr  6 09:55:29.227048 sonic NOTICE root: 178 Used_Rsvd_Blk_Cnt_Chip  0x0000   100   100   000    Old_age   Offline      -       0
Apr  6 09:55:29.227089 sonic NOTICE root: 181 Program_Fail_Cnt_Total  0x0000   100   100   000    Old_age   Offline      -       0
Apr  6 09:55:29.227130 sonic NOTICE root: 182 Erase_Fail_Count_Total  0x0000   100   100   000    Old_age   Offline      -       0
Apr  6 09:55:29.227176 sonic NOTICE root: 187 Reported_Uncorrect      0x0000   100   100   000    Old_age   Offline      -       0
Apr  6 09:55:29.227216 sonic NOTICE root: 192 Power-Off_Retract_Count 0x0000   100   100   000    Old_age   Offline      -       1997
Apr  6 09:55:29.227256 sonic NOTICE root: 194 Temperature_Celsius     0x0000   100   100   000    Old_age   Offline      -       43
Apr  6 09:55:29.227296 sonic NOTICE root: 195 Hardware_ECC_Recovered  0x0000   100   100   000    Old_age   Offline      -       0
Apr  6 09:55:29.227336 sonic NOTICE root: 196 Reallocated_Event_Count 0x0000   100   100   016    Old_age   Offline      -       0
Apr  6 09:55:29.227375 sonic NOTICE root: 198 Offline_Uncorrectable   0x0000   100   100   000    Old_age   Offline      -       0
Apr  6 09:55:29.227415 sonic NOTICE root: 199 UDMA_CRC_Error_Count    0x0000   100   100   050    Old_age   Offline      -       0
Apr  6 09:55:29.227531 sonic NOTICE root: 232 Available_Reservd_Space 0x0000   100   100   000    Old_age   Offline      -       100
Apr  6 09:55:29.227628 sonic NOTICE root: 241 Total_LBAs_Written      0x0000   100   100   000    Old_age   Offline      -       159576
Apr  6 09:55:29.227669 sonic NOTICE root: 242 Total_LBAs_Read         0x0000   100   100   000    Old_age   Offline      -       9349487
Apr  6 09:55:29.227713 sonic NOTICE root: 248 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       99
Apr  6 09:55:29.227753 sonic NOTICE root: 249 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       100
Apr  6 09:55:29.227792 sonic NOTICE root:
Apr  6 09:55:29.227833 sonic NOTICE root: SMART Error Log Version: 1
Apr  6 09:55:29.227877 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:55:29.227932 sonic NOTICE root: No Errors Logged
Apr  6 09:55:29.227976 sonic NOTICE root:
Apr  6 09:55:29.228018 sonic NOTICE root: SMART Self-test log structure revision number 1
Apr  6 09:55:29.228058 sonic NOTICE root: No self-tests have been logged.  [To run self-tests, use: smartctl -t]
Apr  6 09:55:29.228099 sonic NOTICE root:
Apr  6 09:55:29.228138 sonic NOTICE root: SMART Selective self-test log data structure revision number 1
Apr  6 09:55:29.228178 sonic NOTICE root:  SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
Apr  6 09:55:29.228222 sonic NOTICE root:     1        0        0  Not_testing
Apr  6 09:55:29.228263 sonic NOTICE root:     2        0        0  Not_testing
Apr  6 09:55:29.228304 sonic NOTICE root:     3        0        0  Not_testing
Apr  6 09:55:29.228343 sonic NOTICE root:     4        0        0  Not_testing
Apr  6 09:55:29.228383 sonic NOTICE root:     5        0        0  Not_testing
Apr  6 09:55:29.228422 sonic NOTICE root:     6        0    65535  Read_scanning was never started
Apr  6 09:55:29.228461 sonic NOTICE root: Selective self-test flags (0x0):
Apr  6 09:55:29.228504 sonic NOTICE root:   After scanning selected spans, do NOT read-scan remainder of disk.
Apr  6 09:55:29.228564 sonic NOTICE root: If Selective self-test is pending on power-up, resume after 0 minute delay.
Apr  6 09:55:29.228619 sonic NOTICE root:
Apr  6 09:55:29.238749 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:55:29.810777 sonic INFO fstrim[15206]: /var/log: 3.8 GiB (4125442048 bytes) trimmed on /dev/loop1
Apr  6 09:55:29.810879 sonic INFO fstrim[15206]: /host: 24.3 GiB (26108493824 bytes) trimmed on /dev/sda3
Apr  6 09:55:29.811635 sonic INFO systemd[1]: fstrim.service: Succeeded.
Apr  6 09:55:29.812958 sonic INFO systemd[1]: Finished Discard unused blocks.
Apr  6 09:55:29.816635 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:55:32.339830 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:55:32.341992 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:55:32.344568 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:55:32.396666 sonic INFO memory_checker: [memory_checker] Exits without checking memory usage since container 'telemetry' is not running!
Apr  6 09:56:07.807481 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:56:07.809080 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:56:07.812793 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:56:07.950096 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:56:08.241372 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:56:08.529495 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:56:08.814819 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:56:09.106062 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:56:09.371722 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:56:09.631598 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:56:09.898462 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:56:10.197621 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:56:10.450869 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:56:10.700947 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:56:28.786618 sonic INFO dhclient[4602]: XMT: Solicit on eth0, interval 111240ms.
Apr  6 09:56:29.157514 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:56:32.141407 sonic ERR monit[710]: 'container_checker' status failed (3) -- Expected containers not running: telemetry
Apr  6 09:56:32.336639 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:56:32.337137 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:56:32.339125 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:56:32.386895 sonic INFO memory_checker: [memory_checker] Exits without checking memory usage since container 'telemetry' is not running!
Apr  6 09:56:54.571003 sonic NOTICE syncd#SDK: [SAI_LAG.NOTICE] mlnx_sai_lag.c[1123]- mlnx_create_lag: Create lag, empty list
Apr  6 09:56:54.572177 sonic NOTICE syncd#SDK: [SAI_LAG.NOTICE] mlnx_sai_lag.c[1209]- mlnx_create_lag: Created LAG 80
Apr  6 09:56:54.579563 sonic NOTICE syncd#SDK: [SAI_PORT.NOTICE] mlnx_sai_port.c[10817]- mlnx_internal_acls_bind: The per-port IP counter is not enabled.
Apr  6 09:56:54.580115 sonic NOTICE syncd#SDK: [SAI_OBJECT.NOTICE] mlnx_sai_object.c[230]- sai_object_type_get_availability: Querying SAI_OBJECT_TYPE_ROUTE_ENTRY availability, #0 IP_ADDR_FAMILY=IPV4
Apr  6 09:56:54.580267 sonic NOTICE syncd#SDK: [SAI_OBJECT.NOTICE] mlnx_sai_object.c[251]- sai_object_type_get_availability: Got SAI_OBJECT_TYPE_ROUTE_ENTRY availability, #0 IP_ADDR_FAMILY=IPV4 : 201946
Apr  6 09:56:54.580497 sonic WARNING swss#orchagent: :- wait: got not expected response: SAI_STATUS_SUCCESS:getresponse
Apr  6 09:56:54.581656 sonic NOTICE syncd#SDK: [SAI_OBJECT.NOTICE] mlnx_sai_object.c[230]- sai_object_type_get_availability: Querying SAI_OBJECT_TYPE_ROUTE_ENTRY availability, #0 IP_ADDR_FAMILY=IPV6
Apr  6 09:56:54.581749 sonic NOTICE syncd#SDK: [SAI_OBJECT.NOTICE] mlnx_sai_object.c[251]- sai_object_type_get_availability: Got SAI_OBJECT_TYPE_ROUTE_ENTRY availability, #0 IP_ADDR_FAMILY=IPV6 : 201946
Apr  6 09:56:54.585944 sonic NOTICE syncd#SDK: [SAI_OBJECT.NOTICE] mlnx_sai_object.c[230]- sai_object_type_get_availability: Querying SAI_OBJECT_TYPE_NEIGHBOR_ENTRY availability, #0 IP_ADDR_FAMILY=IPV4
Apr  6 09:56:54.586098 sonic NOTICE syncd#SDK: [SAI_OBJECT.NOTICE] mlnx_sai_object.c[251]- sai_object_type_get_availability: Got SAI_OBJECT_TYPE_NEIGHBOR_ENTRY availability, #0 IP_ADDR_FAMILY=IPV4 : 201946
Apr  6 09:56:54.587161 sonic NOTICE syncd#SDK: [SAI_OBJECT.NOTICE] mlnx_sai_object.c[230]- sai_object_type_get_availability: Querying SAI_OBJECT_TYPE_NEIGHBOR_ENTRY availability, #0 IP_ADDR_FAMILY=IPV6
Apr  6 09:56:54.587410 sonic NOTICE syncd#SDK: [SAI_OBJECT.NOTICE] mlnx_sai_object.c[251]- sai_object_type_get_availability: Got SAI_OBJECT_TYPE_NEIGHBOR_ENTRY availability, #0 IP_ADDR_FAMILY=IPV6 : 100973
Apr  6 09:56:54.589731 sonic NOTICE syncd#SDK: [SAI_OBJECT.NOTICE] mlnx_sai_object.c[230]- sai_object_type_get_availability: Querying SAI_OBJECT_TYPE_NEXT_HOP_GROUP availability, empty list
Apr  6 09:56:54.589922 sonic NOTICE syncd#SDK: [SAI_OBJECT.NOTICE] mlnx_sai_object.c[251]- sai_object_type_get_availability: Got SAI_OBJECT_TYPE_NEXT_HOP_GROUP availability, empty list: 201946
Apr  6 09:56:54.610225 sonic NOTICE syncd#SDK: [SAI_OBJECT.NOTICE] mlnx_sai_object.c[230]- sai_object_type_get_availability: Querying SAI_OBJECT_TYPE_FDB_ENTRY availability, empty list
Apr  6 09:56:54.610392 sonic NOTICE syncd#SDK: [SAI_OBJECT.NOTICE] mlnx_sai_object.c[251]- sai_object_type_get_availability: Got SAI_OBJECT_TYPE_FDB_ENTRY availability, empty list: 201946
Apr  6 09:56:54.611670 sonic NOTICE syncd#SDK: [SAI_OBJECT.NOTICE] mlnx_sai_object.c[230]- sai_object_type_get_availability: Querying SAI_OBJECT_TYPE_NEXT_HOP availability, #0 TYPE=MPLS
Apr  6 09:56:54.611816 sonic NOTICE syncd#SDK: [SAI_OBJECT.NOTICE] mlnx_sai_object.c[251]- sai_object_type_get_availability: Got SAI_OBJECT_TYPE_NEXT_HOP availability, #0 TYPE=MPLS : 201946
Apr  6 09:56:54.613022 sonic NOTICE syncd#SDK: [SAI_OBJECT.NOTICE] mlnx_sai_object.c[230]- sai_object_type_get_availability: Querying SAI_OBJECT_TYPE_NEXT_HOP availability, #0 TYPE=SRV6_SIDLIST
Apr  6 09:56:54.613156 sonic NOTICE syncd#SDK: [SAI_OBJECT.NOTICE] mlnx_sai_object.c[251]- sai_object_type_get_availability: Got SAI_OBJECT_TYPE_NEXT_HOP availability, #0 TYPE=SRV6_SIDLIST : 201946
Apr  6 09:57:10.715096 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:57:10.717023 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:57:10.720748 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:57:10.857986 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:57:11.149401 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:57:11.445711 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:57:11.769183 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:57:12.045469 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:57:12.304585 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:57:12.579924 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:57:12.869579 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:57:13.147069 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:57:13.386431 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:57:13.605989 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:57:29.157585 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:57:32.166366 sonic ERR monit[710]: 'container_checker' status failed (3) -- Expected containers not running: telemetry
Apr  6 09:57:32.167932 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:57:32.357708 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:57:32.358528 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:57:32.360830 sonic ERR systemd[1]: Failed to start Telemetry container.
Apr  6 09:57:32.407681 sonic INFO memory_checker: [memory_checker] Exits without checking memory usage since container 'telemetry' is not running!

^C
admin@sonic:~$ docker exec -it syncd python3 /usr/bin/sx_api_lag_dump.py
DISCLAIMER
==========
The following code is an EXAMPLE whose sole purpose is to demonstrate various capabilities of the SDK.
This example represents a common configuration scenario and is in no way a complete guide to the demonstrated feature.
For complete documentation please refer to the SDK API Guide.
There is no guarantee that this example will run successfully on all hardware and software configurations.

LAG Members Table
===============================================================================================================
| SWID| LAG Logical Port| LAG Oper State| PVID| Member Port ID| Port Label| Collector| Distributor| Oper State|
===============================================================================================================
|    0        0x10000000            DOWN     1|               |           |          |            |           |
===============================================================================================================

LAG Vport table
===================================================================================
| SWID| LAG Logical Port| LAG Oper State| PVID| Virtual Port| Vlan|Flow Counter ID|
===================================================================================
|    0        0x10000000            DOWN     1|             |     |               |
===================================================================================
admin@sonic:~$ show version

SONiC Software Version: SONiC.HEAD.0-9a3571cec
Distribution: Debian 11.6
Kernel: 5.10.0-18-2-amd64
Build commit: 9a3571cec
Build date: Wed Mar 15 18:36:52 UTC 2023
Built by: harrish@aviz01

Platform: x86_64-mlnx_msn4600c-r0
HwSKU: ACS-MSN4600C
ASIC: mellanox
ASIC Count: 1
Serial Number: MT2040X05753
Model Number: MSN4600-CS2FC
Hardware Revision: A1
Uptime: 09:58:15 up 12 min,  1 user,  load average: 0.31, 0.26, 0.22
Date: Thu 06 Apr 2023 09:58:15

Docker images:
REPOSITORY                    TAG                IMAGE ID       SIZE
docker-macsec                 HEAD.0-9a3571cec   3fde3242451f   502MB
docker-dhcp-relay             HEAD.0-9a3571cec   0f2355265102   494MB
docker-syncd-mlnx             HEAD.0-9a3571cec   0b5897bc01d9   903MB
docker-syncd-mlnx             latest             0b5897bc01d9   903MB
docker-sonic-telemetry        HEAD.0-9a3571cec   bbcddc8bff1a   565MB
docker-sonic-telemetry        latest             bbcddc8bff1a   565MB
docker-teamd                  HEAD.0-9a3571cec   2098a9406d10   500MB
docker-teamd                  latest             2098a9406d10   500MB
docker-snmp                   HEAD.0-9a3571cec   c34421168ee3   529MB
docker-snmp                   latest             c34421168ee3   529MB
docker-router-advertiser      HEAD.0-9a3571cec   59da0556a812   484MB
docker-router-advertiser      latest             59da0556a812   484MB
docker-platform-monitor       HEAD.0-9a3571cec   b4d83bb4e663   908MB
docker-platform-monitor       latest             b4d83bb4e663   908MB
docker-orchagent              HEAD.0-9a3571cec   1c8a652f6387   519MB
docker-orchagent              latest             1c8a652f6387   519MB
docker-mux                    HEAD.0-9a3571cec   685e47e1ea81   533MB
docker-mux                    latest             685e47e1ea81   533MB
docker-lldp                   HEAD.0-9a3571cec   fb987d8c395b   527MB
docker-lldp                   latest             fb987d8c395b   527MB
docker-fpm-frr                HEAD.0-9a3571cec   9cd5a8aa89c3   530MB
docker-fpm-frr                latest             9cd5a8aa89c3   530MB
docker-database               HEAD.0-9a3571cec   edf412a84bbe   484MB
docker-database               latest             edf412a84bbe   484MB
docker-sonic-mgmt-framework   HEAD.0-9a3571cec   4f46b0682b29   598MB
docker-sonic-mgmt-framework   latest             4f46b0682b29   598MB
docker-sflow                  HEAD.0-9a3571cec   cce27f3ca8a5   469MB
docker-sflow                  latest             cce27f3ca8a5   469MB
docker-nat                    HEAD.0-9a3571cec   dbcb15e49ed6   471MB
docker-nat                    latest             dbcb15e49ed6   471MB

admin@sonic:~$
andriy-kokhan commented 1 year ago

Do you see the same issue on SONiC VS? Please provide steps to reproduce with SAI Challenger (SAI-C). Please include:

  1. The steps to clone SAI-C (Is it on latest SAI-C commit?)
  2. The changes made in SAI-C if any
  3. Exact steps to build and run SAI-C
  4. Sample TC if any
sj-harrish commented 1 year ago

@andriy-kokhan , Thanks much for your response. We are collecting the details and we will get back with the logs. Another follow-up query in this regard, Could you please let us know the reason behind the commit change - https://github.com/opencomputeproject/SAI-Challenger/commit/6799cd2f090502b2ddb1513d59efc851a0a889f4#diff-f3d97dbb02a5bd229642e8abc64ae1710723377d55e63e50faa1a6dceb0eded7 where the self.r.publish("ASIC_STATE_CHANNEL", "G") is changed to self.r.publish("ASIC_STATE_CHANNEL@1", "G") in sai.py.? Thanks in advance,

andriy-kokhan commented 1 year ago

@andriy-kokhan , Thanks much for your response. We are collecting the details and we will get back with the logs. Another follow-up query in this regard, Could you please let us know the reason behind the commit change - 6799cd2#diff-f3d97dbb02a5bd229642e8abc64ae1710723377d55e63e50faa1a6dceb0eded7 where the self.r.publish("ASIC_STATE_CHANNEL", "G") is changed to self.r.publish("ASIC_STATE_CHANNEL@1", "G") in sai.py.? Thanks in advance,

As you probably noticed, the mentioned commit also updates the refpoints of SAI and sairedis/swss-common. The communication mechanism has been slightly updated in SONiC itself. So, the change that you've mentioned has been done to make SAI-C compatible with the new version of sairedis. Not a big problem taking into account that we are trying to be compatible with the latest version of SAI and latest SONiC as a primary use-case.

As a result, two things that you should keep in mind:

  1. Better to update refpoints in sai.env to the proper values whenever you are targeting much older version of SAI/SONiC.
  2. You may need to revert the change that you've mentioned in case you are using that old version of SONiC. For more details, please refer to https://github.com/sonic-net/sonic-swss-common/commit/2443b43288ec7d753870786b240bdfa733106695

Also, you may want to propose a fix to SAI-C community to either somehow auto-detect whether there should be "@1" prefix or make this commit dependent.

andriy-kokhan commented 1 year ago

Just FYI https://github.com/opencomputeproject/SAI-Challenger/pull/89

sj-harrish commented 1 year ago

Thanks much @andriy-kokhan for your inputs. We will get back if we have any further queries.