sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
711 stars 1.36k forks source link

[swss/syncd] race between orchagent removes RIF rate counters from DB and lua script fetching them #11621

Closed stepanblyschak closed 1 year ago

stepanblyschak commented 2 years ago

Description

Errors in the log comming from RIF rate lua script due to orchagent removes RIF rates from COUNTERS DB.

Steps to reproduce the issue:

  1. Create RIF in SONiC, wait till RIF rates are populated in COUNTERS DB
  2. Remove RIF
  3. Repeat until you see:
Aug  4 02:04:51.724424 r-ocelot-02 NOTICE swss#orchagent: :- cleanUpRifFromCounterDb: CleanUp interface PortChannel33 oid oid:0x6000000000823 from counter db
Aug  4 02:04:51.724465 r-ocelot-02 ERR syncd#SDK: :- guard: RedisReply catches system_error: command: *86#015#012$7#015#012EVALSHA#015#012$40#015#0125f7e8b14a9d450f29760700b318672020ca52eab#015#012$2#015#01279#015#012$19#015#012oid:0x6000000000800#015#012$19#015#012oid:0x6000000000801#015#012$19#015#012oid:0x6000000000802#015#012$19#015#012oid:0x6000000000803#015#012$19#015#012oid:0x6000000000804#015#012$19#015#012oid:0x6000000000805#015#012$19#015#012oid:0x6000000000806#015#012$19#015#012oid:0x6000000000807#015#012$19#015#012oid:0x6000000000808#015#012$19#015#012oid:0x6000000000809#015#012$19#015#012oid:0x600000000080b#015#012$19#015#012oid:0x6000000000823#015#012$19#015#012oid:0x6000000000824#015#012$19#015#012oid:0x6000000000825#015#012$19#015#012oid:0x6000000000826#015#012$19#015#012oid:0x6000000000827#015#012$19#015#012oid:0x6000000000828#015#012$19#015#012oid:0x6000000000829#015#012$19#015#012oid:0x600000000082b#015#012$19#015#012oid:0x600000000082c#015#012$19#015#012oid:0x600000000082d#015#012$19#015#012oid:0x600000000082e#015#012$19#015#012oid:0x600000000082f#015#012$19#015#012oid:0x6000000000830#015#012$19#015#012oid:0x6000000000831#015#012$19#015#012oid:0x6000000000832#015#012$19#015#012oid:0x6000000000833#015#012$19#015#012oid:0x6000000000834#015#012$19#015#012oid:0x6000000000836#015#012$19#015#012oid:0x6000000000837#015#012$19#015#012oid:0x6000000000838#015#012$19#015#012oid:0x6000000000839#015#012$19#015#012oid:0x600000000083a#015#012$19#015#012oid:0x600000000083b#015#012$19#015#012oid:0x600000000083c#015#012$19#015#012oid:0x600000000083d#015#012$19#015#012oid:0x600000000083e#015#012$19#015#012oid:0x600000000083f#015#012$19#015#012oid:0x6000000000841#015#012$19#015#012oid:0x6000000000842#015#012$19#015#012oid:0x6000000000843#015#012$19#015#012oid:0x6000000000844#015#012$19#015#012oid:0x6000000000845#015#012$19#015#012oid:0x6000000000846#015#012$19#015#012oid:0x6000000000847#015#012$19#015#012oid:0x6000000000848#015#012$19#015#012oid:0x6000000000849#015#012$19#015#012oid:0x600000000084a#015#012$19#015#012oid:0x600000000084c#015#012$19#015#012oid:0x600000000084d#015#012$19#015#012oid:0x600000000084e#015#012$19#015#012oid:0x600000000084f#015#012$19#015#012oid:0x6000000000850#015#012$19#015#012oid:0x6000000000851#015#012$19#015#012oid:0x6000000000852#015#012$19#015#012oid:0x6000000000853#015#012$19#015#012oid:0x6000000000854#015#012$19#015#012oid:0x6000000000855#015#012$19#015#012oid:0x6000000000857#015#012$19#015#012oid:0x6000000000858#015#012$19#015#012oid:0x6000000000859#015#012$19#015#012oid:0x600000000085a#015#012$19#015#012oid:0x600000000085b#015#012$19#015#012oid:0x600000000085c#015#012$19#015#012oid:0x600000000085d#015#012$19#015#012oid:0x600000000085e#015#012$19#015#012oid:0x600000000085f#015#012$19#015#012oid:0x6000000000860#015#012$19#015#012oid:0x6000000000862#015#012$19#015#012oid:0x6000000000863#015#012$19#015#012oid:0x6000000000864#015#012$19#015#012oid:0x6000000000865#015#012$19#015#012oid:0x6000000000866#015#012$19#015#012oid:0x6000000000867#015#012$19#015#012oid:0x6000000000868#015#012$19#015#012oid:0x6000000000869#015#012$19#015#012oid:0x600000000086a#015#012$19#015#012oid:0x600000000086b#015#012$19#015#012oid:0x600000000086c#015#012$1#015#0122#015#012$8#015#012COUNTERS#015#012$4#015#0121000#015#012$2#015#012''#015#012, reason: ERR Error running script (call to f_5f7e8b14a9d450f29760700b318672020ca52eab): @user_script:48: user_script:48: attempt to perform arithmetic on local 'in_octets' (a boolean value): Input/output error
Aug  4 02:04:51.724465 r-ocelot-02 ERR syncd#SDK: :- runRedisScript: Caught exception while running Redis lua script: RedisReply catches system_error: command: *86#015#012$7#015#012EVALSHA#015#012$40#015#0125f7e8b14a9d450f29760700b318672020ca52eab#015#012$2#015#01279#015#012$19#015#012oid:0x6000000000800#015#012$19#015#012oid:0x6000000000801#015#012$19#015#012oid:0x6000000000802#015#012$19#015#012oid:0x6000000000803#015#012$19#015#012oid:0x6000000000804#015#012$19#015#012oid:0x6000000000805#015#012$19#015#012oid:0x6000000000806#015#012$19#015#012oid:0x6000000000807#015#012$19#015#012oid:0x6000000000808#015#012$19#015#012oid:0x6000000000809#015#012$19#015#012oid:0x600000000080b#015#012$19#015#012oid:0x6000000000823#015#012$19#015#012oid:0x6000000000824#015#012$19#015#012oid:0x6000000000825#015#012$19#015#012oid:0x6000000000826#015#012$19#015#012oid:0x6000000000827#015#012$19#015#012oid:0x6000000000828#015#012$19#015#012oid:0x6000000000829#015#012$19#015#012oid:0x600000000082b#015#012$19#015#012oid:0x600000000082c#015#012$19#015#012oid:0x600000000082d#015#012$19#015#012oid:0x600000000082e#015#012$19#015#012oid:0x600000000082f#015#012$19#015#012oid:0x6000000000830#015#012$19#015#012oid:0x6000000000831#015#012$19#015#012oid:0x6000000000832#015#012$19#015#012oid:0x6000000000833#015#012$19#015#012oid:0x6000000000834#015#012$19#015#012oid:0x6000000000836#015#012$19#015#012oid:0x6000000000837#015#012$19#015#012oid:0x6000000000838#015#012$19#015#012oid:0x6000000000839#015#012$19#015#012oid:0x600000000083a#015#012$19#015#012oid:0x600000000083b#015#012$19#015#012oid:0x600000000083c#015#012$19#015#012oid:0x600000000083d#015#012$19#015#012oid:0x600000000083e#015#012$19#015#012oid:0x600000000083f#015#012$19#015#012oid:0x6000000000841#015#012$19#015#012oid:0x6000000000842#015#012$19#015#012oid:0x6000000000843#015#012$19#015#012oid:0x6000000000844#015#012$19#015#012oid:0x6000000000845#015#012$19#015#012oid:0x6000000000846#015#012$19#015#012oid:0x6000000000847#015#012$19#015#012oid:0x6000000000848#015#012$19#015#012oid:0x6000000000849#015#012$19#015#012oid:0x600000000084a#015#012$19#015#012oid:0x600000000084c#015#012$19#015#012oid:0x600000000084d#015#012$19#015#012oid:0x600000000084e#015#012$19#015#012oid:0x600000000084f#015#012$19#015#012oid:0x6000000000850#015#012$19#015#012oid:0x6000000000851#015#012$19#015#012oid:0x6000000000852#015#012$19#015#012oid:0x6000000000853#015#012$19#015#012oid:0x6000000000854#015#012$19#015#012oid:0x6000000000855#015#012$19#015#012oid:0x6000000000857#015#012$19#015#012oid:0x6000000000858#015#012$19#015#012oid:0x6000000000859#015#012$19#015#012oid:0x600000000085a#015#012$19#015#012oid:0x600000000085b#015#012$19#015#012oid:0x600000000085c#015#012$19#015#012oid:0x600000000085d#015#012$19#015#012oid:0x600000000085e#015#012$19#015#012oid:0x600000000085f#015#012$19#015#012oid:0x6000000000860#015#012$19#015#012oid:0x6000000000862#015#012$19#015#012oid:0x6000000000863#015#012$19#015#012oid:0x6000000000864#015#012$19#015#012oid:0x6000000000865#015#012$19#015#012oid:0x6000000000866#015#012$19#015#012oid:0x6000000000867#015#012$19#015#012oid:0x6000000000868#015#012$19#015#012oid:0x6000000000869#015#012$19#015#012oid:0x600000000086a#015#012$19#015#012oid:0x600000000086b#015#012$19#015#012oid:0x600000000086c#015#012$1#015#0122#015#012$8#015#012COUNTERS#015#012$4#015#0121000#015#012$2#015#012''#015#012, reason: ERR Error running script (call to f_5f7e8b14a9d450f29760700b318672020ca52eab): @user_script:48: user_script:48: attempt to perform arithmetic on local 'in_octets' (a boolean value): Input/output error: Input/output error

Describe the results you received:

Errors in the logs:

Aug  4 02:04:51.724424 r-ocelot-02 NOTICE swss#orchagent: :- cleanUpRifFromCounterDb: CleanUp interface PortChannel33 oid oid:0x6000000000823 from counter db
Aug  4 02:04:51.724465 r-ocelot-02 ERR syncd#SDK: :- guard: RedisReply catches system_error: command: *86#015#012$7#015#012EVALSHA#015#012$40#015#0125f7e8b14a9d450f29760700b318672020ca52eab#015#012$2#015#01279#015#012$19#015#012oid:0x6000000000800#015#012$19#015#012oid:0x6000000000801#015#012$19#015#012oid:0x6000000000802#015#012$19#015#012oid:0x6000000000803#015#012$19#015#012oid:0x6000000000804#015#012$19#015#012oid:0x6000000000805#015#012$19#015#012oid:0x6000000000806#015#012$19#015#012oid:0x6000000000807#015#012$19#015#012oid:0x6000000000808#015#012$19#015#012oid:0x6000000000809#015#012$19#015#012oid:0x600000000080b#015#012$19#015#012oid:0x6000000000823#015#012$19#015#012oid:0x6000000000824#015#012$19#015#012oid:0x6000000000825#015#012$19#015#012oid:0x6000000000826#015#012$19#015#012oid:0x6000000000827#015#012$19#015#012oid:0x6000000000828#015#012$19#015#012oid:0x6000000000829#015#012$19#015#012oid:0x600000000082b#015#012$19#015#012oid:0x600000000082c#015#012$19#015#012oid:0x600000000082d#015#012$19#015#012oid:0x600000000082e#015#012$19#015#012oid:0x600000000082f#015#012$19#015#012oid:0x6000000000830#015#012$19#015#012oid:0x6000000000831#015#012$19#015#012oid:0x6000000000832#015#012$19#015#012oid:0x6000000000833#015#012$19#015#012oid:0x6000000000834#015#012$19#015#012oid:0x6000000000836#015#012$19#015#012oid:0x6000000000837#015#012$19#015#012oid:0x6000000000838#015#012$19#015#012oid:0x6000000000839#015#012$19#015#012oid:0x600000000083a#015#012$19#015#012oid:0x600000000083b#015#012$19#015#012oid:0x600000000083c#015#012$19#015#012oid:0x600000000083d#015#012$19#015#012oid:0x600000000083e#015#012$19#015#012oid:0x600000000083f#015#012$19#015#012oid:0x6000000000841#015#012$19#015#012oid:0x6000000000842#015#012$19#015#012oid:0x6000000000843#015#012$19#015#012oid:0x6000000000844#015#012$19#015#012oid:0x6000000000845#015#012$19#015#012oid:0x6000000000846#015#012$19#015#012oid:0x6000000000847#015#012$19#015#012oid:0x6000000000848#015#012$19#015#012oid:0x6000000000849#015#012$19#015#012oid:0x600000000084a#015#012$19#015#012oid:0x600000000084c#015#012$19#015#012oid:0x600000000084d#015#012$19#015#012oid:0x600000000084e#015#012$19#015#012oid:0x600000000084f#015#012$19#015#012oid:0x6000000000850#015#012$19#015#012oid:0x6000000000851#015#012$19#015#012oid:0x6000000000852#015#012$19#015#012oid:0x6000000000853#015#012$19#015#012oid:0x6000000000854#015#012$19#015#012oid:0x6000000000855#015#012$19#015#012oid:0x6000000000857#015#012$19#015#012oid:0x6000000000858#015#012$19#015#012oid:0x6000000000859#015#012$19#015#012oid:0x600000000085a#015#012$19#015#012oid:0x600000000085b#015#012$19#015#012oid:0x600000000085c#015#012$19#015#012oid:0x600000000085d#015#012$19#015#012oid:0x600000000085e#015#012$19#015#012oid:0x600000000085f#015#012$19#015#012oid:0x6000000000860#015#012$19#015#012oid:0x6000000000862#015#012$19#015#012oid:0x6000000000863#015#012$19#015#012oid:0x6000000000864#015#012$19#015#012oid:0x6000000000865#015#012$19#015#012oid:0x6000000000866#015#012$19#015#012oid:0x6000000000867#015#012$19#015#012oid:0x6000000000868#015#012$19#015#012oid:0x6000000000869#015#012$19#015#012oid:0x600000000086a#015#012$19#015#012oid:0x600000000086b#015#012$19#015#012oid:0x600000000086c#015#012$1#015#0122#015#012$8#015#012COUNTERS#015#012$4#015#0121000#015#012$2#015#012''#015#012, reason: ERR Error running script (call to f_5f7e8b14a9d450f29760700b318672020ca52eab): @user_script:48: user_script:48: attempt to perform arithmetic on local 'in_octets' (a boolean value): Input/output error
Aug  4 02:04:51.724465 r-ocelot-02 ERR syncd#SDK: :- runRedisScript: Caught exception while running Redis lua script: RedisReply catches system_error: command: *86#015#012$7#015#012EVALSHA#015#012$40#015#0125f7e8b14a9d450f29760700b318672020ca52eab#015#012$2#015#01279#015#012$19#015#012oid:0x6000000000800#015#012$19#015#012oid:0x6000000000801#015#012$19#015#012oid:0x6000000000802#015#012$19#015#012oid:0x6000000000803#015#012$19#015#012oid:0x6000000000804#015#012$19#015#012oid:0x6000000000805#015#012$19#015#012oid:0x6000000000806#015#012$19#015#012oid:0x6000000000807#015#012$19#015#012oid:0x6000000000808#015#012$19#015#012oid:0x6000000000809#015#012$19#015#012oid:0x600000000080b#015#012$19#015#012oid:0x6000000000823#015#012$19#015#012oid:0x6000000000824#015#012$19#015#012oid:0x6000000000825#015#012$19#015#012oid:0x6000000000826#015#012$19#015#012oid:0x6000000000827#015#012$19#015#012oid:0x6000000000828#015#012$19#015#012oid:0x6000000000829#015#012$19#015#012oid:0x600000000082b#015#012$19#015#012oid:0x600000000082c#015#012$19#015#012oid:0x600000000082d#015#012$19#015#012oid:0x600000000082e#015#012$19#015#012oid:0x600000000082f#015#012$19#015#012oid:0x6000000000830#015#012$19#015#012oid:0x6000000000831#015#012$19#015#012oid:0x6000000000832#015#012$19#015#012oid:0x6000000000833#015#012$19#015#012oid:0x6000000000834#015#012$19#015#012oid:0x6000000000836#015#012$19#015#012oid:0x6000000000837#015#012$19#015#012oid:0x6000000000838#015#012$19#015#012oid:0x6000000000839#015#012$19#015#012oid:0x600000000083a#015#012$19#015#012oid:0x600000000083b#015#012$19#015#012oid:0x600000000083c#015#012$19#015#012oid:0x600000000083d#015#012$19#015#012oid:0x600000000083e#015#012$19#015#012oid:0x600000000083f#015#012$19#015#012oid:0x6000000000841#015#012$19#015#012oid:0x6000000000842#015#012$19#015#012oid:0x6000000000843#015#012$19#015#012oid:0x6000000000844#015#012$19#015#012oid:0x6000000000845#015#012$19#015#012oid:0x6000000000846#015#012$19#015#012oid:0x6000000000847#015#012$19#015#012oid:0x6000000000848#015#012$19#015#012oid:0x6000000000849#015#012$19#015#012oid:0x600000000084a#015#012$19#015#012oid:0x600000000084c#015#012$19#015#012oid:0x600000000084d#015#012$19#015#012oid:0x600000000084e#015#012$19#015#012oid:0x600000000084f#015#012$19#015#012oid:0x6000000000850#015#012$19#015#012oid:0x6000000000851#015#012$19#015#012oid:0x6000000000852#015#012$19#015#012oid:0x6000000000853#015#012$19#015#012oid:0x6000000000854#015#012$19#015#012oid:0x6000000000855#015#012$19#015#012oid:0x6000000000857#015#012$19#015#012oid:0x6000000000858#015#012$19#015#012oid:0x6000000000859#015#012$19#015#012oid:0x600000000085a#015#012$19#015#012oid:0x600000000085b#015#012$19#015#012oid:0x600000000085c#015#012$19#015#012oid:0x600000000085d#015#012$19#015#012oid:0x600000000085e#015#012$19#015#012oid:0x600000000085f#015#012$19#015#012oid:0x6000000000860#015#012$19#015#012oid:0x6000000000862#015#012$19#015#012oid:0x6000000000863#015#012$19#015#012oid:0x6000000000864#015#012$19#015#012oid:0x6000000000865#015#012$19#015#012oid:0x6000000000866#015#012$19#015#012oid:0x6000000000867#015#012$19#015#012oid:0x6000000000868#015#012$19#015#012oid:0x6000000000869#015#012$19#015#012oid:0x600000000086a#015#012$19#015#012oid:0x600000000086b#015#012$19#015#012oid:0x600000000086c#015#012$1#015#0122#015#012$8#015#012COUNTERS#015#012$4#015#0121000#015#012$2#015#012''#015#012, reason: ERR Error running script (call to f_5f7e8b14a9d450f29760700b318672020ca52eab): @user_script:48: user_script:48: attempt to perform arithmetic on local 'in_octets' (a boolean value): Input/output error: Input/output error

Describe the results you expected:

No errors

Output of show version:

SONiC Software Version: SONiC.202205.20-b1456ee1c_Internal
Distribution: Debian 11.4
Kernel: 5.10.0-12-2-amd64
Build commit: b1456ee1c
Build date: Mon Aug  1 12:23:14 UTC 2022
Built by: sw-r2d2-bot@r-build-sonic-ci03-241

Platform: x86_64-mlnx_msn4410-r0
HwSKU: ACS-MSN4410
ASIC: mellanox
ASIC Count: 1
Serial Number: MT2039X06760
Model Number: MSN4410-WS2FO
Hardware Revision: A1
Uptime: 12:44:43 up  3:15,  2 users,  load average: 0.38, 0.92, 1.06
Date: Thu 04 Aug 2022 12:44:43

Docker images:
REPOSITORY                                         TAG                            IMAGE ID       SIZE
docker-platform-monitor                            202205.20-b1456ee1c_Internal   b92a53896431   993MB
docker-platform-monitor                            latest                         b92a53896431   993MB
docker-syncd-mlnx                                  202205.20-b1456ee1c_Internal   2ade340a46e5   990MB
docker-syncd-mlnx                                  latest                         2ade340a46e5   990MB
docker-orchagent                                   202205.20-b1456ee1c_Internal   761281d6826f   475MB
docker-orchagent                                   latest                         761281d6826f   475MB
docker-macsec                                      latest                         b8a3913eb251   458MB
docker-dhcp-relay                                  latest                         38c5beaa89c7   450MB
docker-sonic-telemetry                             202205.20-b1456ee1c_Internal   fc6f91872da8   520MB
docker-sonic-telemetry                             latest                         fc6f91872da8   520MB
docker-database                                    202205.20-b1456ee1c_Internal   640475ea9c81   440MB
docker-database                                    latest                         640475ea9c81   440MB
docker-router-advertiser                           202205.20-b1456ee1c_Internal   563f997a1fd8   440MB
docker-router-advertiser                           latest                         563f997a1fd8   440MB
docker-mux                                         202205.20-b1456ee1c_Internal   e2993f50f7d5   489MB
docker-mux                                         latest                         e2993f50f7d5   489MB
docker-fpm-frr                                     202205.20-b1456ee1c_Internal   922438d17944   454MB
docker-fpm-frr                                     latest                         922438d17944   454MB
docker-nat                                         202205.20-b1456ee1c_Internal   c38b7dad2664   428MB
docker-nat                                         latest                         c38b7dad2664   428MB
docker-sflow                                       202205.20-b1456ee1c_Internal   925816435673   426MB
docker-sflow                                       latest                         925816435673   426MB
docker-teamd                                       202205.20-b1456ee1c_Internal   aeee8e768530   425MB
docker-teamd                                       latest                         aeee8e768530   425MB
docker-snmp                                        202205.20-b1456ee1c_Internal   5f9f61d8d698   453MB
docker-snmp                                        latest                         5f9f61d8d698   453MB
docker-lldp                                        202205.20-b1456ee1c_Internal   5a56fc2cb5de   450MB
docker-lldp                                        latest                         5a56fc2cb5de   450MB
docker-sonic-mgmt-framework                        202205.20-b1456ee1c_Internal   f1d1439365c9   554MB
docker-sonic-mgmt-framework                        latest                         f1d1439365c9   554MB
urm.nvidia.com/sw-nbu-sws-sonic-docker/sonic-wjh   1.3.0-202205-internal-11       215140d2cb02   494MB

Output of show techsupport:

[syslog.182.gz](https://github.com/sonic-net/sonic-buildimage/files/9260600/syslog.182.gz)

Additional information you deem important (e.g. issue happens only occasionally):

stepanblyschak commented 2 years ago

@sumanbrcm Could you please handle this issue?

PR: https://github.com/sonic-net/sonic-swss/pull/2199

sumanbrcm commented 2 years ago

@stepanblyschak Sure, I will check and work on the fix accordingly . As per the previous discussion in PR (https://github.com/sonic-net/sonic-swss/pull/2199) , the proposal was to move cleanup code to syncd . As per your suggestion >>the cleanup of COUNTERS RIF tables should be done in syncd while cleanup of the mapping should still be done in orchagent << Once we have converged on the use case will fix on that line. Specific to the issue which you reported the likely reason is fetching in_octets has failed and but we are still using in_octets in calculation in below line : local rx_bps_new = (in_octets - in_octets_last) / delta * 1000 This is because table (counters_table_name) might not be existing for the rif. Similar issue is taken care in port_rates.lua script in the following way : if not in_ucast_pkts or not in_non_ucast_pkts or not out_ucast_pkts or not out_non_ucast_pkts or not in_octets or not out_octets then logit("Not found some counters on " .. port) return end We can fix this fix in similar lines along with moving the cleanup code to syncd.

yxieca commented 2 years ago

@sumanbrcm is actively working on this issue.

liat-grozovik commented 1 year ago

@sumanbrcm could you please provide ETA when the fix can be in 202205?

sumanbrcm commented 1 year ago

@liat-grozovik ETA for this fix is 10th Oct.

adyeung commented 1 year ago

Fix merged