stickermule / rump

Hot sync two Redis servers using dumps.
MIT License
491 stars 94 forks source link

Improve performance #32

Open mewa opened 5 years ago

mewa commented 5 years ago

Report

First of all thanks for a nice and robust tool!

I thought it could use some performance improvements. Currently scanning and dumping data from bigger data sets takes quite a bit of time and there are a couple reasons for that:

retpolanne commented 4 years ago

I +1 this issue. The dump-restoring process can be really slow using rump, especially for a large Redis.

I was thinking about which parts can run concurrently and which parts should run together:

Maybe we could have:

Edit: Actually, looking at the code, both read and write are called within a goroutine, and so they signal each other via the message bus. So maybe just calling the dump and TTL within a goroutine that blocks before sending content to the message bus can be helpful.

nixtrace commented 4 years ago

Thanks for the reports, we're looking into a performance regression introduced with the latest version, related to pipelining. We'll update the issue as soon as we have more details.

mewa commented 4 years ago

@vinicyusmacedo What I meant was that you're scanning the next bit of work while you are processing items from the previous scans effectively interleaving them with the dumps. This became an apparent bottleneck when we were extracting keys by pattern where scans would return no results (or little), often many times in a row. As for parallelization, it should be possible in theory (see this article) but I think at this stage doing so is a bit far-fetched.

@nixtrace I don't know about the regression since we've only started using rump in October but we've definitely seen an improvement with some of our changes (especially with our use case of filtering by pattern), i.e.:

We also have introduced a QoL change to allow passing PATTERN to scans, per our own requirements.

I tried to keep these features separate for easy integration (and actually opened this issue with an intent of donating code). Here you can preview the above-mentioned changes, respectively: concurrency, adjusting count parameter and adjusting scan pattern.