sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
736 stars 1.42k forks source link

[warm-reboot] apps crash due to redis is busy running 'tablue_dump.lua' during warm-start #3008

Open dawnbeauty opened 5 years ago

dawnbeauty commented 5 years ago

Description

[Topo summary]: T0 with 20k ipv4 routes, each route with 2 ecmp members.

Steps to reproduce the issue

  1. enable warm sudo config warm_restart enable system

  2. warm restart swss/system sudo systemctl restart warm

  3. WARMBOOT_FINALIZER found bgp did not change state to RECONCILED NOTICE root: WARMBOOT_FINALIZER : Some components didn't finish reconcile: bgp ... NOTICE root: WARMBOOT_FINALIZER : Finalizing warmboot...

Describe the results you received:

bgp did not finish reconcile. Also, other apps(e.g. lldp-syncd) will throw an error whenever they try to access db but db is busy running 'table_dump.lua'.

Describe the results you expected:

All components should change the status to reconciled.

Additional information you deem important (e.g. issue happens only occasionally):

debug

test more with 22694 objects in ASIC_STATE table, 22559 objects in TEMP_ASIC_STATE table

KEYS ASIC_STATE: the fastest method KEYS TEMP_ASIC_STATE: nearly the same as KEYS ASIC_STATE KEYS ASIC_STATE with lua:~9 times slower than KEYS ASIC_STATE KEYS TEMP_ASIC_STATE with lua: the slowest method, ~65 times slower than KEYS ASIC_STATE

It seems like we should improve the dump method of table for prevent blocking redis too long. Or split each database to individual redis instances(processes).

Workaround

Currently, we can set lua-time-limit larger enough to prevent redis from sending reply BUSY to other client requests.

runtime: redis-cli -n 0 config set lua-time-limit 10000 redis.conf: lua-time-limit 10000

yxieca commented 5 years ago

@kcudnik Are you the best person to look into this issue with suggestions?

kcudnik commented 5 years ago

if there are a lot of entries in redis, it can a lot of time for apply view to dump redis DB, 8 seconds in this dump, this doesent't seem to me that much critical, but not sure if we can improve much this, and its not getting only keys, but all values as well, so basically its get keys first and then each value by key

that increase of time is probably by linux kernel slowing down high usege of cpu by sycnd, since method to dump ASIC_STATE and TEMP_STATE is exactly the same, so if those sizes of dbs are the same, then time also should be the same, take a look here: https://github.com/Azure/sonic-sairedis/blob/master/syncd/syncd_applyview.cpp#L7661

kcudnik commented 3 years ago

So i also performed some tests on my side, and it seems like the issues is the LENGTH of the key, i changed prefix from "TEMP_" to just "T" and there is almost no difference, i have 100k ASIC_STATE entreis and 100k TASIC_STATE entries

sonic@centaur ~/sonic-sairedis/tests $ time redis-cli -n 1 "keys" "ASIC_STATE:*"|wc -l
101475

real    0m0,137s
user    0m0,046s
sys     0m0,026s
sonic@centaur ~/sonic-sairedis/tests $ time redis-cli -n 1 "keys" "TASIC_STATE:*"|wc -l
100094

real    0m0,140s
user    0m0,051s
sys     0m0,022s
sonic@centaur ~/sonic-sairedis/tests $  time redis-cli -n 1 "keys" "TEMP_ASIC_STATE:*"|wc -l
100094

real    0m0,124s
user    0m0,030s
sys     0m0,029s

suddenly when we start using keys inside eval:

sonic@centaur ~/sonic-sairedis/tests $ time redis-cli -n 1 eval 'redis.call("keys", KEYS[1].."*")' 1 ASIC_STATE
(nil)

real    0m5,554s
user    0m0,000s
sys     0m0,002s
sonic@centaur ~/sonic-sairedis/tests $ time redis-cli -n 1 eval 'redis.call("keys", KEYS[1].."*")' 1 TASIC_STATE
(nil)

real    0m5,428s
user    0m0,001s
sys     0m0,001s
sonic@centaur ~/sonic-sairedis/tests $ time redis-cli -n 1 eval 'redis.call("keys", KEYS[1].."*")' 1 TEMP_ASIC_STATE
(nil)

real    0m36,393s
user    0m0,002s
sys     0m0,001s

we get jump from 0,13 sec to over 5 seconds !!!, i have no idea why we get such a huge jump here, but when we add also hgetall for all attributes, we gain addition 1,5 seconds:

sonic@centaur ~/sonic-sairedis/tests $ time redis-cli -n 1 --eval table_dump.lua ASIC_STATE  |wc -l
1

real    0m6,460s
user    0m0,017s
sys     0m0,043s
sonic@centaur ~/sonic-sairedis/tests $ time redis-cli -n 1 --eval table_dump.lua TASIC_STATE  |wc -l
1

real    0m6,295s
user    0m0,017s
sys     0m0,044s
sonic@centaur ~/sonic-sairedis/tests $  time redis-cli -n 1 --eval table_dump.lua TEMP_ASIC_STATE  |wc -l
1

real    0m47,449s
user    0m0,020s
sys     0m0,041s

my suspicion is that inside lua or redis, there is a structure representation for string, which is constant size like 16 chars etc, and when size is of string is longer than that, then new memory must be allocated, im not sure yet whether this is related only to table name or to entire key length, since key length ins ASIC_STATE are pretty long

sonic@centaur ~/sonic-sairedis/tests $ time redis-cli -n 1 "keys" "ASIC_STATE:*"|perl -ne 'print length($_),"\n"'|sort|uniq -c |sort -n
      1 53
      1 62
      1 66
      2 55
     28 56
     34 52
     64 60
    199 123
    256 71
    448 64
    640 54
   3600 124
  21230 125
  29701 127
  45270 126

here with sample data we can see that most item has length with 126 chars, but with TEMP_ prefix majority of keys are over 130 characters:

sonic@centaur ~/sonic-sairedis/tests $ time redis-cli -n 1 "keys" "TEMP_ASIC_STATE:*"|perl -ne 'print length($_),"\n"'|sort|uniq -c |sort -n
      1 60
      1 67
     28 61
     32 57
     32 69
    199 128
   3600 129
  21230 130
  29701 132
  45270 131

i cloned https://github.com/redis/redis repo, and grepped for 128 values and there are a lot of places where this pop up, it not obvious whether this is actual cause but that would be my bet on this:

git clone https://github.com/redis/redis.git
cd redis
sonic@centaur ~/redis $ grep -rP "\b128\b"|grep -vP "\|128"|grep -v 128-bit|less|wc -l
103

but at this point is hard to tell which of those is making impact, if at all

we can change "TEMP_" to just "T" as prefix, and this will put temporary asic view, and we can get away with this for ipv4, but when we use ipv6 we wont be able to use this trick here

i also opened issue on redis repo https://github.com/redis/redis/issues/8077, so maybe it could be explained faster

kcudnik commented 3 years ago

Confirmed locally, that indeed this is lua hashing problem https://github.com/redis/redis/issues/8077#issuecomment-733222358 To actually solve this properly we have 2 choices here:

kcudnik commented 3 years ago

After merge https://github.com/redis/redis/pull/8180 this problem will go away on it's own

kcudnik commented 3 years ago

Unfortunately LUA in redis will not be updated since it has some backward comparability problems with new versions, but i asked for a fix in current repo and made PR here: https://github.com/redis/redis/pull/9449

kcudnik commented 3 years ago

ok, that PR on redis was merged, so we just need to wait for next redis package release on ubuntu

liat-grozovik commented 3 years ago

@kcudnik which official version are you referring to? Do you know if it will be released as part of Debian 11, going to SONiC 202111?

dprital commented 2 years ago

@kcudnik which official version are you referring to? Do you know if it will be released as part of Debian 11, going to SONiC 202111?

@kcudnik - Can you please refer to this question ? (Thanks in advance)

kcudnik commented 2 years ago

im referring to official redis server package, please refer here: https://github.com/redis/redis/pull/9449