tair-opensource / RedisShake

RedisShake is a Redis data processing and migration tool.
https://tair-opensource.github.io/RedisShake/
MIT License
3.81k stars 693 forks source link

redis-shake 4.0集群之间的同步存在数据丢失的情况,且消耗的性能也比2版本大 #787

Open tentosleep opened 6 months ago

tentosleep commented 6 months ago

问题描述(Issue Description)

1.使用Redis-shake4.0在数据量较大的场景下存在数据丢失的情况,总内存15G,每个实例两千万左右的key,同步完成后一对主从的key丢失 2.Redisshake4.0同步消耗的主机内存远大于用2.0同步消耗的内存,是正常情况吗

Please provide a brief description of the issue you encountered.

环境信息(Environment)

日志信息(Logs)

如果有错误日志或其他相关日志,请在这里提供。

If there are any error logs or other relevant logs, please provide them here.

执行日志 大数据量情况 {"level":"info","time":"2024-03-29T00:05:57+08:00","message":"not set status port"} {"level":"info","time":"2024-03-29T00:05:57+08:00","message":"start syncing..."} {"level":"info","time":"2024-03-29T00:06:02+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-1, hand shaking"} {"level":"info","time":"2024-03-29T00:06:07+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-2, waiting bgsave"} {"level":"info","time":"2024-03-29T00:06:12+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-0, waiting bgsave"} {"level":"info","time":"2024-03-29T00:06:17+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-1, hand shaking"} {"level":"info","time":"2024-03-29T00:06:22+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-2, waiting bgsave"} {"level":"info","time":"2024-03-29T00:06:27+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-0, waiting bgsave"} {"level":"info","time":"2024-03-29T00:06:32+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-1, hand shaking"} {"level":"info","time":"2024-03-29T00:06:37+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-2, waiting bgsave"} {"level":"info","time":"2024-03-29T00:06:42+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-0, waiting bgsave"} {"level":"info","time":"2024-03-29T00:06:47+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-1, hand shaking"} {"level":"info","time":"2024-03-29T00:06:52+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-2, waiting bgsave"} {"level":"info","time":"2024-03-29T00:06:57+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-0, waiting bgsave"} {"level":"info","time":"2024-03-29T00:07:02+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-1, hand shaking"} {"level":"info","time":"2024-03-29T00:07:07+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-2, waiting bgsave"} {"level":"info","time":"2024-03-29T00:07:12+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-0, waiting bgsave"} {"level":"info","time":"2024-03-29T00:07:17+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-1, hand shaking"} {"level":"info","time":"2024-03-29T00:07:22+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-2, waiting bgsave"} {"level":"info","time":"2024-03-29T00:07:27+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-0, waiting bgsave"} {"level":"info","time":"2024-03-29T00:07:32+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-1, hand shaking"} {"level":"info","time":"2024-03-29T00:07:37+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-2, waiting bgsave"} {"level":"info","time":"2024-03-29T00:07:42+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-0, waiting bgsave"} {"level":"info","time":"2024-03-29T00:07:47+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-1, hand shaking"} {"level":"info","time":"2024-03-29T00:07:52+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-2, waiting bgsave"} {"level":"info","time":"2024-03-29T00:07:57+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-0, waiting bgsave"} {"level":"info","time":"2024-03-29T00:08:02+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-1, hand shaking"} {"level":"info","time":"2024-03-29T00:08:07+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-2, waiting bgsave"} {"level":"info","time":"2024-03-29T00:08:12+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-0, receiving rdb"} {"level":"info","time":"2024-03-29T00:08:17+08:00","message":"read_count=[320289], read_ops=[64766.12], write_count=[320288], write_ops=[64767.12], src-1, hand shaking"} {"level":"info","time":"2024-03-29T00:08:22+08:00","message":"read_count=[651989], read_ops=[68279.07], write_count=[651988], write_ops=[68279.07], src-2, receiving rdb"} {"level":"info","time":"2024-03-29T00:08:27+08:00","message":"read_count=[953445], read_ops=[53125.92], write_count=[953445], write_ops=[53125.92], src-0, syncing rdb, size=[226 MiB/6.6 GiB]"} {"level":"info","time":"2024-03-29T00:08:32+08:00","message":"read_count=[1206712], read_ops=[50456.36], write_count=[1206711], write_ops=[50455.36], src-1, hand shaking"} {"level":"info","time":"2024-03-29T00:08:37+08:00","message":"read_count=[1459741], read_ops=[50093.92], write_count=[1459740], write_ops=[50092.92], src-2, syncing rdb, size=[71 MiB/6.6 GiB]"} {"level":"info","time":"2024-03-29T00:08:42+08:00","message":"read_count=[1715609], read_ops=[50889.46], write_count=[1715608], write_ops=[50889.46], src-0, syncing rdb, size=[321 MiB/6.6 GiB]"} {"level":"info","time":"2024-03-29T00:08:47+08:00","message":"read_count=[1978753], read_ops=[52896.65], write_count=[1978752], write_ops=[52895.65], src-1, hand shaking"} {"level":"info","time":"2024-03-29T00:08:52+08:00","message":"read_count=[2236633], read_ops=[52020.92], write_count=[2236632], write_ops=[52020.92], src-2, syncing rdb, size=[168 MiB/6.6 GiB]"} {"level":"info","time":"2024-03-29T00:08:57+08:00","message":"read_count=[2489089], read_ops=[49442.99], write_count=[2489088], write_ops=[49441.99], src-0, syncing rdb, size=[418 MiB/6.6 GiB]"} {"level":"info","time":"2024-03-29T00:09:02+08:00","message":"read_count=[2743574], read_ops=[49838.39], write_count=[2743573], write_ops=[49838.39], src-1, hand shaking"}

小数据量情况 {"level":"info","time":"2024-03-31T15:03:52+08:00","message":"not set status port"} {"level":"info","time":"2024-03-31T15:03:52+08:00","message":"start syncing..."} {"level":"info","time":"2024-03-31T15:03:57+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-1, waiting bgsave"} {"level":"info","time":"2024-03-31T15:04:02+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-2, waiting bgsave"} {"level":"info","time":"2024-03-31T15:04:07+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-0, waiting bgsave"} {"level":"info","time":"2024-03-31T15:04:12+08:00","message":"read_count=[0], read_ops=[0.00], write_count=[0], write_ops=[0.00], src-1, waiting bgsave"} {"level":"info","time":"2024-03-31T15:04:17+08:00","message":"read_count=[33454], read_ops=[0.00], write_count=[33454], write_ops=[0.00], src-2, receiving rdb"} {"level":"info","time":"2024-03-31T15:04:22+08:00","message":"read_count=[258016], read_ops=[45752.25], write_count=[258015], write_ops=[45753.25], src-0, syncing rdb, size=[123 MiB/1.3 GiB]"} {"level":"info","time":"2024-03-31T15:04:27+08:00","message":"read_count=[486263], read_ops=[44478.29], write_count=[486262], write_ops=[44478.29], src-1, syncing rdb, size=[165 MiB/1.3 GiB]"} {"level":"info","time":"2024-03-31T15:04:32+08:00","message":"read_count=[731859], read_ops=[49249.41], write_count=[731858], write_ops=[49248.41], src-2, syncing rdb, size=[309 MiB/1.3 GiB]"} {"level":"info","time":"2024-03-31T15:04:37+08:00","message":"read_count=[978768], read_ops=[49185.96], write_count=[978768], write_ops=[49185.96], src-0, syncing rdb, size=[462 MiB/1.3 GiB]"} {"level":"info","time":"2024-03-31T15:04:42+08:00","message":"read_count=[1225308], read_ops=[50100.23], write_count=[1225307], write_ops=[50099.23], src-1, syncing rdb, size=[165 MiB/1.3 GiB]"} @@@

其他信息(Additional Information)

请提供任何其他相关的信息,如配置文件、错误信息或截图等。 配置文件 Redisshake4配置文件.txt

Please pro 微信图片_20240331141833.pdf vide any additional information, such as configuration files, error messages, or screenshots.

suxb201 commented 6 months ago
  1. 使用最新版本,优化了内存占用
  2. key 丢失不应该,可以翻翻日志看看为什么
  3. 速度慢是预期内的,想要速度快,可以多起几个 shake,每个db 一个 shake 这样不会慢
tentosleep commented 6 months ago

对比.pdf 1.这边使用的是最新的Redisshake4.0.5版本redis-shake-linux-amd64.tar.gz,但是可以很明显地观察到迁移相同的数据到相同规格的集群,内存消耗远大于2版本; 2.日志就是上面列举的,在小数据量的情况下,src-1、src0和src-2三个分片都会显示同步进度类似size=[123 MiB/1.3 GiB]",最后同步都能顺利完成;大数据量的情况下src-1,会一直卡在hand shaking阶段,可能就是因为这个导致丢数据,请问有解决方法吗 3.源目Redis只用了db0这一个库

Keyspace

db0:keys=27020349,expires=0,avg_ttl=0

suxb201 commented 6 months ago

@tentosleep 一种缓解方法是,如果源端有 3 个分片,那么启动 3 个 redis-shake。其中 reader 分别配置为三个源端,writer 配置为目的端集群。可以解决同步慢的问题,内存膨胀问题难解决,现在应该不会很严重,你可以给些数据看看,比如源端内存使用量,shake内存使用量,是否用大 hash、set、list 等结构。

tentosleep commented 6 months ago

@suxb201 好的多谢,内存慢的问题我试一下尝试开三个进程,但这样和老版本想比更为繁琐; 但是缺数据的情况是因为数据量过大的问题吗,当前源端总内存是48个G左右,每对主从16G,集群都是string类型的散key,最大的key就几kb左右,SHAKE的内存使用量上面截图中有,4.0.5Redisshake进行同步会消耗大概10G左右的内存; 现在比较在意的是用2版本进行数据同步没有这个缺数据的情况,而且内存消耗也远小于4版本,性能上是否2版本略胜一筹 以下是key的总体扫描情况 -------- summary ------- Sampled 27010993 keys in the keyspace! Total key length in bytes is 297120990 (avg len 11.00)

Biggest string found '"188683"' has 1219 bytes 27010993 strings with 12782063401 bytes (100.00% of keys, avg size 473.22)