tair-opensource / RedisShake

RedisShake is a Redis data processing and migration tool.
https://tair-opensource.github.io/RedisShake/
MIT License
3.86k stars 699 forks source link

Why is migration process so slow? #852

Closed msvitok77 closed 3 months ago

msvitok77 commented 3 months ago

问题描述(Issue Description)

I'm trying to measure the migration process, but whatever settings I'm using it still writes just ~20 keys/sec

环境信息(Environment)

日志信息(Logs)

2024-08-14 09:32:52 INF read_count=[20762], read_ops=[19.98], write_count=[20762], write_ops=[19.98], scan_dbid=[0], scan_percent=[0.02%], need_update_count=[0] 2024-08-14 09:32:57 INF read_count=[20863], read_ops=[27.00], write_count=[20863], write_ops=[27.00], scan_dbid=[0], scan_percent=[0.02%], need_update_count=[0]

其他信息(Additional Information)

The destination elasticache has encryption at rest enabled along with encryption in transit. [scan_reader] cluster = false address = "0.0.0.0:6379" tls = false ksn = true

[advanced] rdb_restore_command_behavior = "rewrite" ncpu = 8 pipeline_count_limit = 4096 target_redis_client_max_querybuf_len = 1024_000_000

[redis_writer] cluster = false address = "0.0.0.0:6380" tls = true password = "**"

I haven't found any other setting which might bump the number of keys being written at once.

suxb201 commented 3 months ago

Are all the keys very large? For example, a list-type key with more than 1000 elements.

msvitok77 commented 3 months ago

well I have ~90 million of hashes which look like this:

hash10006169 key10006169 1

the suffix is just some increasing integer value

suxb201 commented 3 months ago

When there are many elements in a hash, the time spent by shake on this key will be long.

msvitok77 commented 3 months ago

Maybe I explained it incorrectly. There are many hashes with one key. Not many Keys.

When there are many elements in a hash, the time spent by shake on this key will be long.

— Reply to this email directly, view it on GitHub https://github.com/tair-opensource/RedisShake/issues/852#issuecomment-2292867153, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADDM36DP63S5CLU2MB6NEZ3ZRWIG5AVCNFSM6AAAAABMPZNHU6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOJSHA3DOMJVGM . You are receiving this because you authored the thread.Message ID: @.***>

msvitok77 commented 3 months ago

Example:

127.0.0.1:6379> hgetall hash63491416
1) "key63491416"
   "1"
127.0.0.1:6379>
msvitok77 commented 3 months ago

Any idea please? Is it normal to transfer just 20 hashes/sec when the hash has just one key and short value?

suxb201 commented 3 months ago

@msvitok77 不正常,但是我也搞不清楚为什么?你可以确认以下几点:

  1. 传输的都是只有一个元素的 hash?或许有某个 hash 是非常大的,或者有某个 list 是非常大的。
  2. 确认下 dump 命令的耗时,你可以在源端执行 dump hash63491416 来确认下。
msvitok77 commented 3 months ago

re 1.) the hashes are the same because all of them were generated by a script re 2.) will try

msvitok77 commented 3 months ago

Results:

time redis-cli -r 1 dump hash10430850
"\r\x1a\x1a\x00\x00\x00\x17\x00\x00\x00\x02\x00\x00\x0bkey10430850\r\xf2\xff\t\x00\xed\xb0\x01\x17Y\xf9\x8fN"

real    0m0.154s
user    0m0.006s
sys     0m0.004s
suxb201 commented 3 months ago

你到源端 DB 的延迟比较高,考虑使用更高的 count,一次性拉取多个 key:

[scan_reader]
# ...
count = 8
msvitok77 commented 3 months ago

will try, thank you

msvitok77 commented 3 months ago

With count=30 I was able to get this performance:

2024-08-19 10:28:46 INF read_count=[194443], read_ops=[420.80], write_count=[194443], write_ops=[420.80], scan_dbid=[0], scan_percent=[0.20%], need_update_count=[0]
2024-08-19 10:28:51 INF read_count=[196418], read_ops=[423.01], write_count=[196418], write_ops=[423.01], scan_dbid=[0], scan_percent=[0.20%], need_update_count=[0]

So roughly 2000 keys / sec

suxb201 commented 3 months ago

在我的测试中 count=1 可以得到 9k qps,count=10 可以得到 50k qps。

msvitok77 commented 3 months ago

Maybe the main problem might be that I'm using ssh tunnels to connect to both redis clusters. Another problem might be that the target redis is using encryption at rest + encryption in transit.

msvitok77 commented 3 months ago

Never mind, I was able to somehow enable PSYNC on my cluster which I need to migration. Can I somehow combine sync_reader with ksn? Because I don't have aof enabled and I want to somehow track the changes being made while migrating data from the RDB file. Thank you.