[warm-reboot] apps crash due to redis is busy running 'tablue_dump.lua' during warm-start

dawnbeauty commented 5 years ago

Description

[Topo summary]: T0 with 20k ipv4 routes, each route with 2 ecmp members.

Steps to reproduce the issue

enable warm sudo config warm_restart enable system
warm restart swss/system sudo systemctl restart warm
WARMBOOT_FINALIZER found bgp did not change state to RECONCILED NOTICE root: WARMBOOT_FINALIZER : Some components didn't finish reconcile: bgp ... NOTICE root: WARMBOOT_FINALIZER : Finalizing warmboot...

Describe the results you received:

bgp did not finish reconcile. Also, other apps(e.g. lldp-syncd) will throw an error whenever they try to access db but db is busy running 'table_dump.lua'.

Describe the results you expected:

All components should change the status to reconciled.

Additional information you deem important (e.g. issue happens only occasionally):

debug

Via the syslog, we found that fpmsyncd throw a exception due to redis is busy running a script, when fpmsyncd try to set RECONCILED to state db.

Jun 11 17:43:07.204680 tau1t8-01 NOTICE syncd#syncd: :- dump: getting took 1.139821 sec
Jun 11 17:43:07.797314 tau1t8-01 NOTICE syncd#syncd: :- redisGetAsicView: objects count for ASIC_STATE: 22697
Jun 11 17:43:07.818872 tau1t8-01 NOTICE syncd#syncd: :- redisGetAsicView: get asic view from ASIC_STATE took 1.916766 sec
Jun 11 17:43:11.644892 tau1t8-01 INFO syncd#supervisord: syncd NPS#
Jun 11 17:43:11.645907 tau1t8-01 INFO syncd#supervisord: syncd #015
Jun 11 17:43:11.645995 tau1t8-01 INFO syncd#supervisord: syncd unit 0, temperature 70.175501 degree centigrade#015
Jun 11 17:43:11.788528 sonic NOTICE bgp#fpmsyncd: :- main: Warm-Restart timer expired.
Jun 11 17:43:11.788528 sonic NOTICE bgp#fpmsyncd: :- reconcile: Warm-Restart: Initiating reconciliation process for bgp application.
Jun 11 17:43:14.099929 sonic INFO database#supervisord: redis-server 32:M 11 Jun 09:43:14.097 # Lua slow script detected: still in execution after 6274 milliseconds. You can try kii
lling the script using the SCRIPT KILL command.
Jun 11 17:43:14.135476 sonic ERR bgp#fpmsyncd: :- checkReplyType: Expected to get redis type 3 got type 6, err: BUSY Redis is busy running a script. You can only call SCRIPT KILL oo
r SHUTDOWN NOSAVE.
Jun 11 17:43:14.136122 sonic INFO bgp#supervisord: fpmsyncd Exception "Wrong expected type of result: Input/output error" had been thrown in deamon
Jun 11 17:43:15.471834 sonic INFO bgp#supervisord 2019-06-11 09:43:14,138 INFO exited: fpmsyncd (exit status 0; expected)
Jun 11 17:43:15.959560 tau1t8-01 NOTICE syncd#syncd: :- dump: getting took 8.131201 sec
Jun 11 17:43:16.574011 tau1t8-01 NOTICE syncd#syncd: :- redisGetAsicView: objects count for TEMP_ASIC_STATE: 22691
Jun 11 17:43:16.598789 tau1t8-01 NOTICE syncd#syncd: :- redisGetAsicView: get asic view from TEMP_ASIC_STATE took 8.775713 sec
Jun 11 17:43:16.607765 tau1t8-01 NOTICE syncd#syncd: :- matchOids: matched oids

we do not clearly known why redis dump TEMP_ASIC_STATE(8.13s) 7 times slower than do the same action with ASIC_STATE(1.14s), while objects of the two table are almost the same.

test more with 22694 objects in ASIC_STATE table, 22559 objects in TEMP_ASIC_STATE table

measure time of "eval" the table_dump.lua

root@sonic:/# time redis-cli -n 1 --eval table_dump.lua ASIC_STATE >asic_state.dump
real 0m0.625s
user 0m0.008s
sys 0m0.016s
root@sonic:/# time redis-cli -n 1 --eval table_dump.lua TEMP_ASIC_STATE >asic_state.dump
real 0m7.065s
user 0m0.008s
sys 0m0.048s

What table_dump.lua do is:
1. use "keys" to get all keys match pattern
2. use a for loop to hgetall fvs of each key.

measure the time of "keys" with lua

root@sonic:/# time redis-cli -n 1 eval 'redis.call("keys", KEYS[1].."*")' 1 TEMP_ASIC_STATE >asic_state.dump
real 0m6.544s
user 0m0.004s
sys 0m0.020s
root@sonic:/# time redis-cli -n 1 eval 'redis.call("keys", KEYS[1].."*")' 1 ASIC_STATE >asic_state.dump
real 0m0.618s
user 0m0.016s
sys 0m0.012s

measure the time of "keys" without lua

root@sonic:/# time redis-cli -n 1 keys TEMP_ASIC_STATE:* >asic_state.dump
real 0m0.117s
user 0m0.016s
sys 0m0.040s
root@sonic:/# time redis-cli -n 1 keys ASIC_STATE:* >asic_state.dump
real 0m0.120s
user 0m0.024s
sys 0m0.040s

The test result show us

command "keys" with lua cost a great deal of time, more than command "hgetall"
redis command with lua script slower than directly redis command.

KEYS ASIC_STATE: the fastest method KEYS TEMP_ASIC_STATE: nearly the same as KEYS ASIC_STATE KEYS ASIC_STATE with lua:~9 times slower than KEYS ASIC_STATE KEYS TEMP_ASIC_STATE with lua: the slowest method, ~65 times slower than KEYS ASIC_STATE

It seems like we should improve the dump method of table for prevent blocking redis too long. Or split each database to individual redis instances(processes).

Workaround

Currently, we can set lua-time-limit larger enough to prevent redis from sending reply BUSY to other client requests.

runtime: redis-cli -n 0 config set lua-time-limit 10000 redis.conf: lua-time-limit 10000

yxieca commented 5 years ago

@kcudnik Are you the best person to look into this issue with suggestions?

kcudnik commented 5 years ago

if there are a lot of entries in redis, it can a lot of time for apply view to dump redis DB, 8 seconds in this dump, this doesent't seem to me that much critical, but not sure if we can improve much this, and its not getting only keys, but all values as well, so basically its get keys first and then each value by key

that increase of time is probably by linux kernel slowing down high usege of cpu by sycnd, since method to dump ASIC_STATE and TEMP_STATE is exactly the same, so if those sizes of dbs are the same, then time also should be the same, take a look here: https://github.com/Azure/sonic-sairedis/blob/master/syncd/syncd_applyview.cpp#L7661

kcudnik commented 3 years ago

So i also performed some tests on my side, and it seems like the issues is the LENGTH of the key, i changed prefix from "TEMP_" to just "T" and there is almost no difference, i have 100k ASIC_STATE entreis and 100k TASIC_STATE entries

sonic@centaur ~/sonic-sairedis/tests $ time redis-cli -n 1 "keys" "ASIC_STATE:*"|wc -l
101475

real    0m0,137s
user    0m0,046s
sys     0m0,026s
sonic@centaur ~/sonic-sairedis/tests $ time redis-cli -n 1 "keys" "TASIC_STATE:*"|wc -l
100094

real    0m0,140s
user    0m0,051s
sys     0m0,022s
sonic@centaur ~/sonic-sairedis/tests $  time redis-cli -n 1 "keys" "TEMP_ASIC_STATE:*"|wc -l
100094

real    0m0,124s
user    0m0,030s
sys     0m0,029s

suddenly when we start using keys inside eval:

sonic@centaur ~/sonic-sairedis/tests $ time redis-cli -n 1 eval 'redis.call("keys", KEYS[1].."*")' 1 ASIC_STATE
(nil)

real    0m5,554s
user    0m0,000s
sys     0m0,002s
sonic@centaur ~/sonic-sairedis/tests $ time redis-cli -n 1 eval 'redis.call("keys", KEYS[1].."*")' 1 TASIC_STATE
(nil)

real    0m5,428s
user    0m0,001s
sys     0m0,001s
sonic@centaur ~/sonic-sairedis/tests $ time redis-cli -n 1 eval 'redis.call("keys", KEYS[1].."*")' 1 TEMP_ASIC_STATE
(nil)

real    0m36,393s
user    0m0,002s
sys     0m0,001s

we get jump from 0,13 sec to over 5 seconds !!!, i have no idea why we get such a huge jump here, but when we add also hgetall for all attributes, we gain addition 1,5 seconds:

sonic@centaur ~/sonic-sairedis/tests $ time redis-cli -n 1 --eval table_dump.lua ASIC_STATE  |wc -l
1

real    0m6,460s
user    0m0,017s
sys     0m0,043s
sonic@centaur ~/sonic-sairedis/tests $ time redis-cli -n 1 --eval table_dump.lua TASIC_STATE  |wc -l
1

real    0m6,295s
user    0m0,017s
sys     0m0,044s
sonic@centaur ~/sonic-sairedis/tests $  time redis-cli -n 1 --eval table_dump.lua TEMP_ASIC_STATE  |wc -l
1

real    0m47,449s
user    0m0,020s
sys     0m0,041s

my suspicion is that inside lua or redis, there is a structure representation for string, which is constant size like 16 chars etc, and when size is of string is longer than that, then new memory must be allocated, im not sure yet whether this is related only to table name or to entire key length, since key length ins ASIC_STATE are pretty long

sonic@centaur ~/sonic-sairedis/tests $ time redis-cli -n 1 "keys" "ASIC_STATE:*"|perl -ne 'print length($_),"\n"'|sort|uniq -c |sort -n
      1 53
      1 62
      1 66
      2 55
     28 56
     34 52
     64 60
    199 123
    256 71
    448 64
    640 54
   3600 124
  21230 125
  29701 127
  45270 126

here with sample data we can see that most item has length with 126 chars, but with TEMP_ prefix majority of keys are over 130 characters:

sonic@centaur ~/sonic-sairedis/tests $ time redis-cli -n 1 "keys" "TEMP_ASIC_STATE:*"|perl -ne 'print length($_),"\n"'|sort|uniq -c |sort -n
      1 60
      1 67
     28 61
     32 57
     32 69
    199 128
   3600 129
  21230 130
  29701 132
  45270 131

i cloned https://github.com/redis/redis repo, and grepped for 128 values and there are a lot of places where this pop up, it not obvious whether this is actual cause but that would be my bet on this:

git clone https://github.com/redis/redis.git
cd redis
sonic@centaur ~/redis $ grep -rP "\b128\b"|grep -vP "\|128"|grep -v 128-bit|less|wc -l
103

but at this point is hard to tell which of those is making impact, if at all

we can change "TEMP_" to just "T" as prefix, and this will put temporary asic view, and we can get away with this for ipv4, but when we use ipv6 we wont be able to use this trick here

i also opened issue on redis repo https://github.com/redis/redis/issues/8077, so maybe it could be explained faster

kcudnik commented 3 years ago

Confirmed locally, that indeed this is lua hashing problem https://github.com/redis/redis/issues/8077#issuecomment-733222358 To actually solve this properly we have 2 choices here:

add redis-server submodule and apply patch to source, and use that compiled on fly debian package
or reduce table names from ASIC_STATE to for example "A", but this will not be backward compatible and will brake some tests

kcudnik commented 3 years ago

After merge https://github.com/redis/redis/pull/8180 this problem will go away on it's own

kcudnik commented 3 years ago

Unfortunately LUA in redis will not be updated since it has some backward comparability problems with new versions, but i asked for a fix in current repo and made PR here: https://github.com/redis/redis/pull/9449

kcudnik commented 3 years ago

ok, that PR on redis was merged, so we just need to wait for next redis package release on ubuntu

liat-grozovik commented 3 years ago

@kcudnik which official version are you referring to? Do you know if it will be released as part of Debian 11, going to SONiC 202111?

dprital commented 2 years ago

@kcudnik which official version are you referring to? Do you know if it will be released as part of Debian 11, going to SONiC 202111?

@kcudnik - Can you please refer to this question ? (Thanks in advance)

kcudnik commented 2 years ago

im referring to official redis server package, please refer here: https://github.com/redis/redis/pull/9449

sonic-net / sonic-buildimage