Open anamehra opened 4 months ago
The issue will be triaged further in the chassis meeting
@stephenxs @stepanblyschak @liat-grozovik : can you please help with this.
@judyjoseph @arlakshm @mlok-nokia @ysmanman for viz. Will apply for master image also.
Feature 'Install before advt.' might be disable for 202405.
@anamehra Could you please share a tech support when the issue occurs? What is the route scale on the system? If you have an opportunity to play with the system, could you please increase the timeout to 1h and check whether route_check.py eventually finishes or is stuck without progress?
@anamehra Could you please share a tech support when the issue occurs? What is the route scale on the system? If you have an opportunity to play with the system, could you please increase the timeout to 1h and check whether route_check.py eventually finishes or is stuck without progress?
The route_check eventually finished. I saw it took a couple of more mins. We have 50K routes. I will check on show tech.
this is currently still an issue with 202405
@deepak-singhal0408 - I have attached logs for routeCheck issue. routeCheck_logs.txt
this feature is enabled back in Master. https://github.com/sonic-net/sonic-buildimage/pull/19836
Tried 2 iterations with device having 32k v4+32k v6 routes..
Neighbhor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd NeighborName
10.0.0.1 4 65200 61249 14398 0 0 0 01:02:57 1 ARISTA01T3 10.0.0.5 4 65200 0 0 0 0 0 never Active ARISTA03T3 10.0.0.7 4 65200 6059 5857 0 0 0 4d00h24m 1 ARISTA04T3 10.0.0.11 4 65200 6056 5856 0 0 0 4d00h24m 33793 ARISTA06T3
Iteration1: <<<<<<<<<<<<<<<<< Checking routes for namespaces: ['asic0', 'asic1']
real 3m16.387s user 1m26.084s sys 0m7.275s
Iteration2: <<<<<<<<<<<<<<<<<<<<<<<<< Checking routes for namespaces: ['asic0', 'asic1']
real 3m18.249s user 1m26.760s sys 0m7.926s
python -m cProfile -s time route_check.py 122726378 function calls (82385912 primitive calls) in 216.529 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall filename:lineno(function) 6 90.089 15.015 90.089 15.015 {built-in method time.sleep} 14 82.537 5.896 82.653 5.904 {method 'read' of '_io.TextIOWrapper' objects} 51279296/15794766 10.061 0.000 15.341 0.000 encoder.py:333(_iterencode_dict) 2 6.252 3.126 6.252 3.126 {built-in method swsscommon._swsscommon.new_SubscriberStateTable} 12 4.621 0.385 4.621 0.385 decoder.py:343(raw_decode) 20647482/15794694 3.588 0.000 10.100 0.000 encoder.py:277(_iterencode_list) 106 2.978 0.028 2.978 0.028 {method 'format' of 'str' objects} 15794766 2.714 0.000 18.055 0.000 encoder.py:413(_iterencode) 9 1.360 0.151 19.613 2.179 encoder.py:182(encode) 12982522 1.148 0.000 1.148 0.000 {built-in method builtins.isinstance} 205278 0.854 0.000 1.632 0.000 ipaddress.py:1603(_ip_int_from_string) 4736970 0.720 0.000 0.720 0.000 {built-in method _json.encode_basestring_ascii} 821056 0.655 0.000 0.891 0.000 ipaddress.py:1201(_parse_octet) 410527 0.453 0.000 2.253 0.000 ipaddress.py:1269(init) 410514 0.446 0.000 1.700 0.000 ipaddress.py:1175(_ip_int_from_string) 615687 0.381 0.000 0.666 0.000 ipaddress.py:1707(_parse_hextet) 2 0.374 0.187 180.955 90.478 route_check.py:520(check_frr_pending_routes) <<<<<<<<<<<<<<<<< 205295 0.316 0.000 2.137 0.000 ipaddress.py:1875(init) 139211 0.288 0.000 0.289 0.000 {method 'join' of 'str' objects} 273646 0.288 0.000 3.931 0.000 route_check.py:165(is_local) 1231834 0.285 0.000 0.285 0.000 {method 'split' of 'str' objects}
With following optimizations, route_check time is reduced to 1m30sec.
Description
On chassis, after the introduction of frr route check in route_check.py (https://github.com/sonic-net/sonic-utilities/pull/2762), route_check.py may take more than 2 mins to finish. The current timeout is 2 mins which causes route check to fail and affects monit output. This affects the sonic-mgmt pretest check. Other test cases relying on monit output may also be affected.
The issue was opened for 202305 earlier which was fixed by reverting the feature for frr route check: https://github.com/sonic-net/sonic-buildimage/issues/17403
This needs to be fixed for master.
Steps to reproduce the issue:
1. 2. 3.
Describe the results you received:
Describe the results you expected:
Output of
show version
:Output of
show techsupport
:Additional information you deem important (e.g. issue happens only occasionally):