Memory Dump for very large RDB files (> 30 GBs) is Slow

sripathikrishnan / redis-rdb-tools

Parse Redis dump.rdb files, Analyze Memory, and Export Data to JSON

https://rdbtools.com

MIT License

5.07k stars 739 forks source link

Memory Dump for very large RDB files (> 30 GBs) is Slow #23

Open jsrawan-mobo opened 11 years ago

jsrawan-mobo commented 11 years ago

For very large RDB, the memory dump can take upwards of 30 minutes. Even slower, the "key" feature requires a sequential scan over the whole file.

Finally trying to further introspect a data structure like a hash, list, set to find out which field is taking up the most memory. In my case I use celery as worker queue, and some tasks can be gigantic.

So I've made some enhancements such as the following i) Reduce time to about 5 minutes to dump in quick mode ii) Allow re-seeking for key contents in seconds, and limit mode iii) Allow for verbose dumping of hash/list/set to file structure

sripathikrishnan commented 11 years ago

@jsrawan-mobo Thanks for taking the time to investigate this!

I am painfully aware of the sub-optimal performance. I have been tracking it under issue#1, but haven't really found the motivation to fix it yet.

It seems you have made some fixes/enhancements. Did you miss a pull request? Can you point me where you have made these fixes?

jsrawan-mobo commented 11 years ago

See Pull Request #24.

It not completely done, but you can try and see the performance improvement by skipping past the lzf_decompress() and storing the index to a deep dump later.

If you like where its headed, i can cleanup and do a proper pull request.

amarlot commented 8 years ago

Have you been able to improve it ? Would it be possible to realease it ? As for huge DB (about 50Go / 1 Millions keys) on very faster server it takes like half a day as it's monothread.

Thanks, Alex

jsrawan-mobo commented 8 years ago

I hadn't looked at this in a few years, seems like this project went stale. The pull request I put up does work in quick mode like this if you want to give it a try

1) Generate a quick memory dump and index. In quick mode, only compressed_size is valid. rdb.py -c memory -q --file redis_memory_quick.csv redis.rdb

2) After viewing, dump a hash/list to view contents of a offending key rdb.py -c memory --max 1 --pos 3568796958 -v --key mongow --file redis_memory_mongow.csv redis.rdb

I'd be willing to fix this up if someone finds use for it, or fork the repo.