sohutv / cachecloud

搜狐视频(sohu tv)Redis私有云平台 :支持Redis多种架构(Standalone、Sentinel、Cluster)高效管理、有效降低大规模redis运维成本,提升资源管控能力和利用率。平台提供快速搭建/迁移,运维管理,弹性伸缩,统计监控,客户端整合接入等功能。(CacheCloud is a Redis cloud management platform. It supports Standalone, Sentinel, and Cluster architectures for Redis, effectively reducing large-scale Redis operation and maintenance costs, and improving resource management and utilization. The platform provides rapid construction/migration, operation and maintenance management, elastic scaling, statistical monitoring, client integration and access and other functions)
http://cachecloud.github.io/
Apache License 2.0
8.76k stars 2.04k forks source link

扩容时报错 #291

Closed liuqian1990 closed 2 years ago

liuqian1990 commented 2 years ago

2022-07-15 08:48:10.288 WARN 45036 --- [nio-8080-exec-4] c.sohu.cache.redis.RedisClusterReshard : 192.168.10.133:7506 isSingleNode 2022-07-15 08:48:10.291 WARN 45036 --- [nio-8080-exec-4] c.sohu.cache.redis.RedisClusterReshard : 192.168.10.131:26400 isSingleNode 2022-07-15 08:48:12.294 ERROR 45036 --- [nio-8080-exec-4] c.s.cache.redis.RedisClusterReshard$1 : ERR Unknown node ff355d07a9678e1d7cdd798503a9363dedbacb34

redis.clients.jedis.exceptions.JedisDataException: ERR Unknown node ff355d07a9678e1d7cdd798503a9363dedbacb34 at redis.clients.jedis.Protocol.processError(Protocol.java:139) at redis.clients.jedis.Protocol.process(Protocol.java:173) at redis.clients.jedis.Protocol.read(Protocol.java:227) at redis.clients.jedis.Connection.readProtocolWithCheckingBroken(Connection.java:320) at redis.clients.jedis.Connection.getStatusCodeReply(Connection.java:229) at redis.clients.jedis.Jedis.clusterReplicate(Jedis.java:3584) at com.sohu.cache.redis.RedisClusterReshard$1.execute(RedisClusterReshard.java:134) at com.sohu.cache.util.IdempotentConfirmer.run(IdempotentConfirmer.java:27) at com.sohu.cache.redis.RedisClusterReshard.joinCluster(RedisClusterReshard.java:138) at com.sohu.cache.stats.app.impl.AppDeployCenterImpl.addHorizontalNodes(AppDeployCenterImpl.java:607) at com.sohu.cache.web.controller.AppManageController.doAddHorizontalNodes(AppManageController.java:265) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:190) at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:138)

//3.复制 建立主从时报错,由于node信息还没广播到 if (hasSlave) { final String masterNodeId = getNodeId(appId, masterJedis); if (masterNodeId == null) { logger.error(String.format("joinCluster:host=%s,port=%s nodeId is null", masterHost, masterPort)); return false; } return new IdempotentConfirmer() { @Override public boolean execute() { try { //等待广播节点,这个时间估计太短 TimeUnit.SECONDS.sleep(2); } catch (Exception e) { logger.error(e.getMessage(), e); } String response = slaveJedis.clusterReplicate(masterNodeId); logger.info("clusterReplicate-{}:{}={}", slaveHost, slavePort, response); return response != null && response.equalsIgnoreCase("OK"); } }.run(); } else { return true; }

githubname1024 commented 2 years ago

您好,clusterReplicate方法通过被IdempotentConfirmer类包装,默认是可重试的,且默认重试三次。每次间隔为2s,第三次执行的时候已经等待6s+,按照cluster replicate为几秒级事件来看,应该是能够成功的。如果确实因为集群较大,或网络延时较大,可考虑调整每次sleep时间,或增加重试次数。

liuqian1990 commented 2 years ago

这个是否可以,从执行cluster nodes 看主库nodeid在不在再执行

githubname1024 commented 2 years ago

可以的。 另额外说明:cachecloud中目前不推荐水平扩容,数据量较大时,操作较慢,且对redis服务有一定的影响。且一旦失败,回滚较困难。如果涉及到集群容量变更,可以考虑新建集群,通过redis-shake进行数据同步,然后用新集群替换掉原有集群。上述方案中的数据同步,已在cachecloud系统实现。

liuqian1990 commented 2 years ago

好的,谢谢