sewenew / redis-plus-plus

Redis client written in C++
Apache License 2.0
1.64k stars 351 forks source link

What to do with "Failed to update shards info" #469

Closed mlanzuisi closed 1 year ago

mlanzuisi commented 1 year ago

Hello,

with many master nodes in a cluster, when one of these nodes fails and slots are not replicated anywhere else, client provides this error "Failed to update shards info" and the whole application seems to stuck... Is there anything I can do while the node comes back to life and the cluster tries to fix itself?

Environment:

sewenew commented 1 year ago

What do you mean the whole application seems to stuck?

"Failed to update shards info" means that redis-plus-plus cannot get shards info, i.e. slot-node mapping, from Redis Cluster. If only one of your master nodes is down, redis-plus-plus should get shards info from other nodes (it randomly pick a node to fetch the info for 3 times).

However, since one master is down, some slots cannot be coverage. In this case, by default, Redis Cluster refuses serving requests, and the client library can do nothing, but wait until the whole cluster is fixed. In order to make it continue serving, you need to configure cluster-require-full-coverage no.

Regards

mlanzuisi commented 1 year ago

I have a single threaded client using redis-plus-plus library and X master nodes running in kubernetes. The client uses the name of a pod instead of the ip address because we are in kubernetes. The client is making sequential SET commands using as a key a sequential number (from 1 to X) and as a value the same number. It is a debug and testing application, and what I'm trying to do is to understand the behaviour when a slot is not managed by any master, and when the cluster is not working properly. I had indeed the default "cluster-require-full-coverage yes" parameter, and now I changed it in "cluster-require-full-coverage no", but what happens is:

1 - X masters are running and slots are shared among them; 2 - 1 client is running, printing the key that is setting and eventually the exception to the command; 3 - one master (pod) is deleted, and the cluster goes in a "wrong" state (not all slots are covered); 4 - the dead Redis master starts again, fixes the slots status, add itself to the cluster and asks for a reshard. All these operations are successful. 5 - the client application "seems to stuck" because it is blocked on a "set" function (RedisCluster class) a lot of seconds before to start making SET commands again...

Is there any waiting time? Any parameter to set to reduce this latency?

mlanzuisi commented 1 year ago

Adding 2 things:

a - the error before the stuck is "Failed to send command with key" b - the situation of the slots after point 4 in previous command is

127.0.0.1:6379 (cb81fec4...) -> 629712 keys | 5036 slots | 0 slaves. 10.244.67.62:6379 (774e7995...) -> 683980 keys | 3269 slots | 0 slaves. 10.244.67.49:6379 (dbef334a...) -> 740464 keys | 3125 slots | 0 slaves. 10.244.67.4:6379 (04569144...) -> 330100 keys | 3275 slots | 0 slaves. 10.244.67.20:6379 (f7f0baec...) -> 397827 keys | 1679 slots | 0 slaves. [OK] 2782083 keys in 5 masters. 169.80 keys per slot on average.

Performing Cluster Check (using node 127.0.0.1:6379) M: cb81fec468d40ead5f2107600c28493d9da0faf0 127.0.0.1:6379 slots:[1259-1651],[1654-1655],[1660],[1675-1676],[1678],[1685],[1692],[1699-1700],[1702-1703],[1709],[1711],[1716-1717],[1720],[1726-1729],[1732],[1735],[1738],[1741],[1744],[1746],[1749],[1752],[1755],[1758],[1762],[1766-1769],[1772-1773],[1776],[1778-1779],[1781],[1783],[1786],[1788],[1790],[1793],[1798-1799],[1806-1807],[1809],[1812-1816],[1821-1825],[1830],[1832-1833],[1836],[1839],[1841],[1846-1847],[1851],[1853],[1860],[1863],[1866],[1868],[1871-1872],[1874],[1879-1880],[1883],[1885],[1890],[1893],[1896],[1900],[1903-1904],[1908],[1910],[1912],[1927],[1931],[1933],[1936],[1939],[1941],[1949],[1951-1952],[1955],[1957],[1959-1961],[1963],[1966-1971],[1973-1988],[1991-1998],[2000-2025],[2027-2043],[2045-2051],[2053],[2055-2062],[2064-2074],[2076-2084],[2086-2101],[2103-2108],[2110-2132],[2135-2147],[2150],[2152-2158],[2160-2161],[2163-2170],[2172],[2174],[2176-2178],[2180-2182],[2184-2965],[5462],[5465],[5467-5468],[5470],[5473],[5475-5477],[5479],[5493-5495],[5497],[5504-5505],[5514],[5518-5519],[5522-5523],[5525],[5527-5528],[5534-5535],[5541-5543],[5546],[5552-5553],[5555],[5558-5559],[5561],[5564-5565],[5568],[5570-6093],[7649],[7656-7657],[7659-7660],[7662],[7664],[7669-7670],[7675],[7679],[7682-7683],[7685],[7690],[7693],[7702],[7706],[7710],[7719],[7721],[7723-7724],[7729],[7731],[7733-7734],[7743-7744],[7747-7748],[7755-7756],[7765],[7767],[7769],[7776],[7778],[7782-7785],[7787],[7790],[7794],[7796],[7803],[7806],[7808],[7812],[7814-7816],[7821-7822],[7825],[7829],[7831-7832],[7835],[7843],[7845],[7849],[7851],[7854-7855],[7860],[7864],[7866-7868],[7871],[7886],[7889-7890],[7895-7896],[7899],[7901],[7903-7904],[7906-7907],[7910],[7914],[7919-7923],[7926],[7928-7929],[7931-7933],[7935],[7938],[7946],[7948],[7950],[7954],[7956-7957],[7959],[7967],[7970],[7972-7975],[7977-7978],[7980-7982],[7987],[7989],[7993],[7997-7998],[8003],[8005-8006],[8009],[8015],[8017],[8019],[8027],[8029],[8031],[8035],[8037],[8040-8041],[8043],[8046-8048],[8053],[13107],[13111],[13123],[13127-13128],[13130-13131],[13139],[13141],[13143],[13149],[13154],[13156],[13158],[13160],[13165],[13167],[13175],[13181],[13183-13185],[13187],[13192-13193],[13197-13198],[13202-13203],[13205-13206],[13208-13210],[13213],[13219],[13224-13226],[13228],[13240],[13243],[13247-13248],[13254],[13257-13258],[13261],[13264],[13266],[13268],[13274],[13276],[13281],[13285-13286],[13290],[13295],[13298],[13307],[13309],[13312],[13314],[13323-13325],[13328],[13331],[13333],[13338],[13345],[13351],[13358-13360],[13362-13363],[13366-13367],[13372-13373],[13378],[13381],[13383-13384],[13388-13389],[13394],[13396-13399],[13404],[13406],[13409],[13412],[13415-13417],[13420-13421],[13423],[13425],[13431],[13436-13437],[13439],[13441],[13444-13445],[13448-13449],[13452-13453],[13460],[13462-13464],[13470-13471],[13473-13474],[13477],[13479],[13482],[13484],[13487],[13491],[13495],[13498],[13501],[13509-13510],[13512],[13515],[13517],[13521-13523],[13525],[13531-13532],[13536-13537],[13539-13541],[13546],[13548-13549],[13553-13554],[13562-13563],[13577],[13580],[13583],[13588-13591],[13597-13598],[13604],[13607],[13611],[13613-13615],[13618],[13622],[13625],[13630-13631],[13633-13641],[13647],[13649-13650],[13653-13654],[13656],[13660-13661],[13665],[13667],[13672-13673],[13675],[13677],[13679],[13681-13685],[13688],[13690-13691],[13693],[13695],[13701],[13703],[13707-13709],[13715],[13718-13719],[13721],[13723],[13728-13730],[13732],[13735-13736],[13738-13739],[13747-13748],[13751-13753],[13755],[13757],[13759-16383] (5036 slots) master M: 774e79953698af561b1e1fe9171d67f5499e496d 10.244.67.62:6379 slots:[8189-10922],[12288-12289],[12294],[12296],[12300],[12304],[12306-12307],[12310],[12314],[12318],[12322],[12331],[12366],[12368],[12373],[12377],[12390],[12393],[12396],[12401],[12423],[12436],[12441],[12447],[12461-12462],[12472],[12474],[12480],[12482],[12490-12491],[12501],[12510],[12527-12528],[12531],[12535],[12545],[12549],[12552],[12572],[12584],[12587],[12592],[12606-12607],[12619],[12626],[12631],[12659-12660],[12669],[12675-12676],[12680],[12691-12692],[12695],[12702],[12705],[12708],[12712-12714],[12717],[12732],[12754],[12756],[12764],[12771],[12775],[12785],[12792-12793],[12795],[12817],[12823],[12831],[12835],[12840],[12844],[12848],[12851],[12855],[12857],[12860],[12879],[12885-12886],[12889],[12898],[12901],[12905],[12935],[12938],[12944],[12956],[12958],[12966],[12997],[13002],[13009],[13015],[13043],[13057-13058],[13064],[13067],[13077],[13083],[13091],[13095],[13101],[13106],[13108-13110],[13112-13122],[13124-13126],[13129],[13132-13138],[13140],[13142],[13144-13148],[13150-13153],[13155],[13157],[13159],[13161-13164],[13166],[13168-13174],[13176-13180],[13182],[13186],[13188-13191],[13194-13196],[13199-13201],[13204],[13207],[13211-13212],[13214-13218],[13220-13223],[13227],[13229-13239],[13241-13242],[13244-13246],[13249-13253],[13255-13256],[13259-13260],[13262-13263],[13265],[13267],[13269-13273],[13275],[13277-13280],[13282-13284],[13287-13289],[13291-13294],[13296-13297],[13299-13306],[13308],[13310-13311],[13313],[13315-13322],[13326-13327],[13329-13330],[13332],[13334-13337],[13339-13344],[13346-13350],[13352-13357],[13361],[13364-13365],[13368-13371],[13374-13377],[13379-13380],[13382],[13385-13387],[13390-13393],[13395],[13400-13403],[13405],[13407-13408],[13410-13411],[13413-13414],[13418-13419],[13422],[13424],[13426-13430],[13432-13435],[13438],[13440],[13442-13443],[13446-13447],[13450-13451],[13454-13459],[13461],[13465-13469],[13472],[13475-13476],[13478],[13480-13481],[13483],[13485-13486],[13488-13490],[13492-13494],[13496-13497],[13499-13500],[13502-13508],[13511],[13513-13514],[13516],[13518-13520],[13524],[13526-13530],[13533-13535],[13538],[13542-13545],[13547],[13550-13552],[13555-13561],[13564-13576],[13578-13579],[13581-13582],[13584-13587],[13592-13596],[13599-13603],[13605-13606],[13608-13610],[13612],[13616-13617],[13619-13621],[13623-13624],[13626-13629],[13632],[13642-13646],[13648],[13651-13652],[13655],[13657-13659],[13662-13664],[13666],[13668-13671],[13674],[13676],[13678],[13680],[13686-13687],[13689],[13692],[13694],[13696-13700],[13702],[13704-13706],[13710-13714],[13716-13717],[13720],[13722],[13724-13727],[13731],[13733-13734],[13737],[13740-13746],[13749-13750],[13754],[13756],[13758] (3269 slots) master M: dbef334a12757e0bbb15b513617a97638226c940 10.244.67.49:6379 slots:[3747-5460],[6827],[6829-6835],[6837-6838],[6840-6842],[6844],[6846-6849],[6851-6860],[6862-6866],[6868-6873],[6875-6878],[6880-6884],[6886-6896],[6898-6913],[6915-6927],[6929-6936],[6938-6944],[6946-6948],[6950-6951],[6953-6954],[6956-6965],[6967-6971],[6973-6977],[6980-6982],[6984-6986],[6988],[6990-6992],[6994-6998],[7000-7002],[7004-7008],[7010-7015],[7019-7020],[7024-7035],[7037-7041],[7043-7045],[7047-7048],[7050-7056],[7058-7064],[7066-7075],[7077-7090],[7092-7096],[7098-7102],[7104],[7107-7126],[7128-7133],[7135-7153],[7156-7157],[7159-7173],[7175-7177],[7179-7189],[7191-7202],[7204-7206],[7208-7218],[7220-7225],[7227-7234],[7236-7238],[7240-7241],[7243-7244],[7246-7247],[7249-7250],[7252-7266],[7268-7270],[7272-7278],[7280-7305],[7307-7309],[7311-7316],[7318-7321],[7323],[7325-7336],[7338-7339],[7341],[7343-7345],[7348-7351],[7353-7354],[7356],[7359-7379],[7382-7383],[7385-7388],[7390-7396],[7398-7404],[7406-7430],[7432-7439],[7441-7443],[7445],[7447-7458],[7460-7471],[7473-7480],[7482-7489],[7491-7523],[7525-7543],[7545-7566],[7568-7576],[7578-7580],[7582-7588],[7590-7600],[7602-7607],[7609-7618],[7620-7624],[7626-7628],[7630-7633],[7635-7639],[7641-7643],[7645],[12290-12293],[12295],[12297-12299],[12301-12303],[12305],[12308-12309],[12311-12313],[12315-12317],[12319-12321],[12323-12330],[12332-12365],[12367],[12369-12372],[12374-12376],[12378-12389],[12391-12392],[12394-12395],[12397-12400],[12402-12422],[12424-12435],[12437-12440],[12442-12446],[12448-12460],[12463-12471],[12473],[12475-12479],[12481],[12483-12489],[12492-12500],[12502-12509],[12511-12526],[12529-12530],[12532-12534],[12536-12544],[12546-12548],[12550-12551],[12553-12571],[12573-12583],[12585-12586],[12588-12591],[12593-12605],[12608-12618],[12620-12625],[12627-12630],[12632-12658],[12661-12668],[12670-12674],[12677-12679],[12681-12690],[12693-12694],[12696-12701],[12703-12704],[12706-12707],[12709-12711],[12715-12716],[12718-12731],[12733-12753],[12755],[12757-12763],[12765-12770],[12772-12774],[12776-12784],[12786-12791],[12794],[12796-12816],[12818-12822],[12824-12830],[12832-12834],[12836-12839],[12841-12843],[12845-12847],[12849-12850],[12852-12854],[12856],[12858-12859],[12861-12878],[12880-12884],[12887-12888],[12890-12897],[12899-12900],[12902-12904],[12906-12934],[12936-12937],[12939-12943],[12945-12955],[12957],[12959-12965],[12967-12996],[12998-13001],[13003-13008],[13010-13014],[13016-13042],[13044-13056],[13059-13063],[13065-13066],[13068-13076],[13078-13082],[13084-13090],[13092-13094],[13096-13100],[13102-13105] (3125 slots) master M: 04569144ddb15bb1851d711d87e8897303d5db00 10.244.67.4:6379 slots:[0-1258],[1652-1653],[1656-1659],[1661-1674],[1677],[1679-1684],[1686-1691],[1693-1698],[1701],[1704-1708],[1710],[1712-1715],[1718-1719],[1721-1725],[1730-1731],[1733-1734],[1736-1737],[1739-1740],[1742-1743],[1745],[1747-1748],[1750-1751],[1753-1754],[1756-1757],[1759-1761],[1763-1765],[1770-1771],[1774-1775],[1777],[1780],[1782],[1784-1785],[1787],[1789],[1791-1792],[1794-1797],[1800-1805],[1808],[1810-1811],[1817-1820],[1826-1829],[1831],[1834-1835],[1837-1838],[1840],[1842-1845],[1848-1850],[1852],[1854-1859],[1861-1862],[1864-1865],[1867],[1869-1870],[1873],[1875-1878],[1881-1882],[1884],[1886-1889],[1891-1892],[1894-1895],[1897-1899],[1901-1902],[1905-1907],[1909],[1911],[1913-1926],[1928-1930],[1932],[1934-1935],[1937-1938],[1940],[1942-1948],[1950],[1953-1954],[1956],[1958],[1962],[1964-1965],[1972],[1989-1990],[1999],[2026],[2044],[2052],[2054],[2063],[2075],[2085],[2102],[2109],[2133-2134],[2148-2149],[2151],[2159],[2162],[2171],[2173],[2175],[2179],[2183],[2966-3746],[5461],[5463-5464],[5466],[5469],[5471-5472],[5474],[5478],[5480-5492],[5496],[5498-5503],[5506-5513],[5515-5517],[5520-5521],[5524],[5526],[5529-5533],[5536-5540],[5544-5545],[5547-5551],[5554],[5556-5557],[5560],[5562-5563],[5566-5567],[5569],[6094-6512],[6828],[6836],[6839],[6843],[6845],[6850],[6861],[6867],[6874],[6879],[6885],[6897],[6914],[6928],[6937],[6945],[6949],[6952],[6955],[6966],[6972],[6978-6979],[6983],[6987],[6989],[6993],[6999],[7003],[7009],[7016-7018],[7021-7023],[7036],[7042],[7046],[7049],[7057],[7065],[7076],[7091],[7097],[7103],[7105-7106],[7127],[7134],[7154-7155],[7158],[7174],[7178],[7190],[7203],[7207],[7219],[7226],[7235],[7239],[7242],[7245],[7248],[7251],[7267],[7271],[7279],[7306],[7310],[7317],[7322],[7324],[7337],[7340],[7342],[7346-7347],[7352],[7355],[7357-7358],[7380-7381],[7384],[7389],[7397],[7405],[7431],[7440],[7444],[7446],[7459],[7472],[7481],[7490],[7524],[7544],[7567],[7577],[7581],[7589],[7601],[7608],[7619],[7625],[7629],[7634],[7640],[7644],[7646-7648],[7650-7655],[7658],[7661],[7663],[7665-7668],[7671-7674],[7676-7678],[7680-7681],[7684],[7686-7689],[7691-7692],[7694-7701],[7703-7705],[7707-7709],[7711-7718],[7720],[7722],[7725-7728],[7730],[7732],[7735-7742],[7745-7746],[7749-7754],[7757-7764],[7766],[7768],[7770-7775],[7777],[7779-7781],[7786],[7788-7789],[7791-7793],[7795],[7797-7802],[7804-7805],[7807],[7809-7811],[7813],[7817-7820],[7823-7824],[7826-7828],[7830],[7833-7834],[7836-7842],[7844],[7846-7848],[7850],[7852-7853],[7856-7859],[7861-7863],[7865],[7869-7870],[7872-7885],[7887-7888],[7891-7894],[7897-7898],[7900],[7902],[7905],[7908-7909],[7911-7913],[7915-7918],[7924-7925],[7927],[7930],[7934],[7936-7937],[7939-7945],[7947],[7949],[7951-7953],[7955],[7958],[7960-7966],[7968-7969],[7971],[7976],[7979],[7983-7986],[7988],[7990-7992],[7994-7996],[7999-8002],[8004],[8007-8008],[8010-8014],[8016],[8018],[8020-8026],[8028],[8030],[8032-8034],[8036],[8038-8039],[8042],[8044-8045],[8049-8052],[8054-8188] (3275 slots) master M: f7f0baece1dfba834e681d0ce36025798785dd8e 10.244.67.20:6379 slots:[6513-6826],[10923-12287] (1679 slots) master [OK] All nodes agree about slots configuration. Check for open slots... Check slots coverage... [OK] All 16384 slots covered.

sewenew commented 1 year ago

Is there any waiting time? Any parameter to set to reduce this latency?

You can set ConnectionOptions::socket_timeout to avoid waiting too long.

Regards

mlanzuisi commented 1 year ago

That socket_timeout parameter didn't help a lot, but putting a small value of connect_timeout really helped.

Thank you so much for your help. Regards

sewenew commented 1 year ago

OK, that means it stuck when connecting to Redis.

Regards