sripathikrishnan / redis-rdb-tools

Parse Redis dump.rdb files, Analyze Memory, and Export Data to JSON
https://rdbtools.com
MIT License
5.08k stars 742 forks source link

Parser performance not optimal ~1min for a 24MB file #1

Open sripathikrishnan opened 12 years ago

sripathikrishnan commented 12 years ago

Profiler output for a 24MB dump.rdb file

     44009161 function calls (44008966 primitive calls) in 205.628 CPU seconds

Ordered by: standard name

ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 205.628 205.628 :1() 1 0.000 0.000 0.000 0.000 :1(DecimalTuple) 10 0.000 0.000 0.000 0.000 StringIO.py:119(read) 1 0.000 0.000 0.000 0.000 StringIO.py:30() 10 0.000 0.000 0.000 0.000 StringIO.py:38(_complain_ifclosed) 1 0.000 0.000 0.000 0.000 StringIO.py:42(StringIO) 2 0.000 0.000 0.000 0.000 StringIO.py:54(init) 2 0.000 0.000 0.000 0.000 UserDict.py:17(getitem) 1 0.000 0.000 0.000 0.000 UserDict.py:57(get) 1 0.000 0.000 0.000 0.000 UserDict.py:69(contains) 1 0.000 0.000 0.000 0.000 future.py:48() 1 0.000 0.000 0.000 0.000 future.py:74(_Feature) 7 0.000 0.000 0.000 0.000 future.py:75(init) 1 0.046 0.046 0.297 0.297 init.py:1() 1 0.000 0.000 0.000 0.000 init.py:49(normalize_encoding) 1 0.000 0.000 0.013 0.013 init.py:71(search_function) 9/5 0.000 0.000 0.000 0.000 abc.py:137(subclasscheck) 37 0.000 0.000 0.000 0.000 abc.py:7(abstractmethod) 19 0.001 0.000 0.003 0.000 abc.py:78(new) 60 0.000 0.000 0.001 0.000 abc.py:81() 5 0.000 0.000 0.000 0.000 abc.py:97(register) 1 0.060 0.060 0.173 0.173 callbacks.py:1() 1 0.000 0.000 0.000 0.000 callbacks.py:194(DiffCallback) 1 0.020 0.020 0.033 0.033 callbacks.py:26(_floatconstants) 1 0.000 0.000 0.000 0.000 callbacks.py:264(MemoryCallback) 1 0.000 0.000 0.000 0.000 callbacks.py:269(init) 1 0.000 0.000 0.000 0.000 callbacks.py:279(start_rdb) 1 0.000 0.000 0.000 0.000 callbacks.py:282(start_database) 1 0.000 0.000 0.000 0.000 callbacks.py:285(end_database) 1 0.000 0.000 0.000 0.000 callbacks.py:288(end_rdb) 1000001 24.914 0.000 110.527 0.000 callbacks.py:291(set) 2 0.000 0.000 0.000 0.000 callbacks.py:299(start_hash) 2 0.000 0.000 0.000 0.000 callbacks.py:323(start_set) 6 0.000 0.000 0.000 0.000 callbacks.py:327(sadd) 2 0.000 0.000 0.000 0.000 callbacks.py:332(end_set) 1000003 3.987 0.000 9.397 0.000 callbacks.py:383(end_key) 1000003 3.457 0.000 5.409 0.000 callbacks.py:388(newline) 2000004 21.586 0.000 24.588 0.000 callbacks.py:391(sizeof_string) 1000003 5.104 0.000 11.793 0.000 callbacks.py:409(top_level_object_overhead) 1000003 1.560 0.000 1.560 0.000 callbacks.py:416(key_expiry_overhead) 1000003 3.552 0.000 5.197 0.000 callbacks.py:433(hashtable_entry_overhead) 1000003 9.298 0.000 19.548 0.000 callbacks.py:45(_encode_basestring_ascii) 2000006 3.137 0.000 3.137 0.000 callbacks.py:454(sizeof_pointer) 1000003 7.538 0.000 30.889 0.000 callbacks.py:72(_encode) 1000003 3.676 0.000 34.565 0.000 callbacks.py:91(encode_key) 1 0.000 0.000 0.000 0.000 callbacks.py:97(JSONCallback) 1 0.000 0.000 0.000 0.000 codecs.py:77(new) 1 0.004 0.004 0.004 0.004 collections.py:1() 1 0.001 0.001 0.001 0.001 collections.py:13(namedtuple) 34 0.000 0.000 0.000 0.000 collections.py:43() 4 0.000 0.000 0.000 0.000 collections.py:60() 4 0.000 0.000 0.000 0.000 collections.py:61() 6 0.000 0.000 0.000 0.000 copy.py:100(_copy_immutable) 6 0.000 0.000 0.000 0.000 copy.py:65(copy) 1 0.002 0.002 0.068 0.068 decimal.py:116() 1 0.000 0.000 0.000 0.000 decimal.py:158(DecimalException) 1 0.000 0.000 0.000 0.000 decimal.py:181(Clamped) 1 0.000 0.000 0.000 0.000 decimal.py:193(InvalidOperation) 1 0.000 0.000 0.000 0.000 decimal.py:222(ConversionSyntax) 1 0.000 0.000 0.000 0.000 decimal.py:232(DivisionByZero) 1 0.000 0.000 0.000 0.000 decimal.py:248(DivisionImpossible) 1 0.000 0.000 0.000 0.000 decimal.py:259(DivisionUndefined) 1 0.000 0.000 0.000 0.000 decimal.py:270(Inexact) 1 0.000 0.000 0.000 0.000 decimal.py:282(InvalidContext) 1 0.000 0.000 0.000 0.000 decimal.py:296(Rounded) 1 0.000 0.000 0.000 0.000 decimal.py:308(Subnormal) 1 0.000 0.000 0.000 0.000 decimal.py:319(Overflow) 1 0.000 0.000 0.000 0.000 decimal.py:357(Underflow) 1 0.000 0.000 0.000 0.000 decimal.py:3611(_ContextManager) 1 0.000 0.000 0.000 0.000 decimal.py:3626(Context) 3 0.000 0.000 0.000 0.000 decimal.py:3645(init) 1 0.000 0.000 0.000 0.000 decimal.py:4925(_WorkRep) 1 0.000 0.000 0.000 0.000 decimal.py:503(Decimal) 6 0.000 0.000 0.021 0.003 decimal.py:512(new) 1 0.000 0.000 0.000 0.000 decimal.py:5158(_Log10Memoize) 1 0.000 0.000 0.000 0.000 decimal.py:5162(init) 8 0.000 0.000 0.000 0.000 genericpath.py:15(exists) 3 0.000 0.000 0.000 0.000 gettext.py:130(_expand_lang) 1 0.000 0.000 0.001 0.001 gettext.py:421(find) 1 0.000 0.000 0.001 0.001 gettext.py:476(translation) 1 0.000 0.000 0.001 0.001 gettext.py:542(dgettext) 1 0.000 0.000 0.001 0.001 gettext.py:580(gettext) 1 0.000 0.000 0.000 0.000 hex_codec.py:27(hex_decode) 1 0.000 0.000 0.000 0.000 hex_codec.py:45(Codec) 1 0.000 0.000 0.000 0.000 hex_codec.py:52(IncrementalEncoder) 1 0.000 0.000 0.000 0.000 hex_codec.py:57(IncrementalDecoder) 1 0.000 0.000 0.000 0.000 hex_codec.py:62(StreamWriter) 1 0.000 0.000 0.000 0.000 hex_codec.py:65(StreamReader) 1 0.000 0.000 0.000 0.000 hex_codec.py:70(getregentry) 1 0.000 0.000 0.000 0.000 hex_codec.py:8() 1 0.000 0.000 0.000 0.000 io.py:1030(BufferedWriter) 1 0.000 0.000 0.000 0.000 io.py:1117(BufferedRWPair) 1 0.000 0.000 0.000 0.000 io.py:1183(BufferedRandom) 1 0.000 0.000 0.000 0.000 io.py:1247(TextIOBase) 1 0.000 0.000 0.000 0.000 io.py:1295(IncrementalNewlineDecoder) 1 0.000 0.000 0.000 0.000 io.py:1371(TextIOWrapper) 1 0.000 0.000 0.000 0.000 io.py:1850(StringIO) 1 0.000 0.000 0.000 0.000 io.py:267(_DocDescriptor) 1 0.000 0.000 0.000 0.000 io.py:276(OpenWrapper) 1 0.000 0.000 0.000 0.000 io.py:290(UnsupportedOperation) 1 0.000 0.000 0.000 0.000 io.py:294(IOBase) 1 0.028 0.028 0.036 0.036 io.py:35() 1 0.000 0.000 0.000 0.000 io.py:566(RawIOBase) 1 0.000 0.000 0.000 0.000 io.py:621(FileIO) 1 0.000 0.000 0.000 0.000 io.py:643(BufferedIOBase) 1 0.000 0.000 0.000 0.000 io.py:715(_BufferedIOMixin) 1 0.000 0.000 0.000 0.000 io.py:72(BlockingIOError) 1 0.000 0.000 0.000 0.000 io.py:792(_BytesIO) 1 0.000 0.000 0.000 0.000 io.py:898(BytesIO) 1 0.000 0.000 0.000 0.000 io.py:905(BufferedReader) 1 0.000 0.000 0.000 0.000 keyword.py:11() 3 0.000 0.000 0.000 0.000 locale.py:316(normalize) 1 0.000 0.000 0.000 0.000 numbers.py:13(Number) 1 0.000 0.000 0.000 0.000 numbers.py:169(Real) 1 0.000 0.000 0.000 0.000 numbers.py:270(Rational) 1 0.000 0.000 0.000 0.000 numbers.py:295(Integral) 1 0.000 0.000 0.000 0.000 numbers.py:34(Complex) 1 0.000 0.000 0.002 0.002 numbers.py:6() 6 0.000 0.000 0.001 0.000 optparse.py:1007(add_option) 1 0.000 0.000 0.001 0.001 optparse.py:1185(init) 1 0.000 0.000 0.000 0.000 optparse.py:1237(_create_option_list) 1 0.000 0.000 0.001 0.001 optparse.py:1242(_add_help_option) 1 0.000 0.000 0.001 0.001 optparse.py:1252(_populate_option_list) 1 0.000 0.000 0.000 0.000 optparse.py:1262(_init_parsing_state) 1 0.000 0.000 0.000 0.000 optparse.py:1271(set_usage) 1 0.000 0.000 0.000 0.000 optparse.py:1307(_get_all_options) 1 0.000 0.000 0.000 0.000 optparse.py:1313(get_default_values) 1 0.000 0.000 0.000 0.000 optparse.py:1356(_get_args) 1 0.000 0.000 0.000 0.000 optparse.py:1362(parse_args) 1 0.000 0.000 0.000 0.000 optparse.py:1401(check_values) 1 0.000 0.000 0.000 0.000 optparse.py:1414(_process_args) 2 0.000 0.000 0.000 0.000 optparse.py:1511(_process_short_opts) 1 0.000 0.000 0.000 0.000 optparse.py:200(init) 1 0.000 0.000 0.000 0.000 optparse.py:224(set_parser) 1 0.000 0.000 0.000 0.000 optparse.py:365(init) 6 0.000 0.000 0.001 0.000 optparse.py:560(init) 6 0.000 0.000 0.000 0.000 optparse.py:579(_check_opt_strings) 6 0.000 0.000 0.000 0.000 optparse.py:588(_set_opt_strings) 6 0.000 0.000 0.000 0.000 optparse.py:609(_set_attrs) 6 0.000 0.000 0.000 0.000 optparse.py:629(_check_action) 6 0.000 0.000 0.000 0.000 optparse.py:635(_check_type) 6 0.000 0.000 0.000 0.000 optparse.py:665(_check_choice) 6 0.000 0.000 0.000 0.000 optparse.py:678(_check_dest) 6 0.000 0.000 0.000 0.000 optparse.py:693(_check_const) 6 0.000 0.000 0.000 0.000 optparse.py:699(_check_nargs) 6 0.000 0.000 0.000 0.000 optparse.py:708(_check_callback) 2 0.000 0.000 0.000 0.000 optparse.py:752(takes_value) 2 0.000 0.000 0.000 0.000 optparse.py:764(check_value) 2 0.000 0.000 0.000 0.000 optparse.py:771(convert_value) 2 0.000 0.000 0.000 0.000 optparse.py:778(process) 2 0.000 0.000 0.000 0.000 optparse.py:790(take_action) 6 0.000 0.000 0.000 0.000 optparse.py:832(isbasestring) 1 0.000 0.000 0.000 0.000 optparse.py:837(init) 1 0.000 0.000 0.000 0.000 optparse.py:932(init) 1 0.000 0.000 0.000 0.000 optparse.py:943(_create_option_mappings) 1 0.000 0.000 0.000 0.000 optparse.py:959(set_conflict_handler) 1 0.000 0.000 0.000 0.000 optparse.py:964(set_description) 6 0.000 0.000 0.000 0.000 optparse.py:980(_check_conflict) 1 0.042 0.042 0.078 0.078 parser.py:1() 1 0.000 0.000 0.000 0.000 parser.py:239(RdbParser) 1 0.000 0.000 0.000 0.000 parser.py:258(init) 1 11.802 11.802 205.146 205.146 parser.py:267(parse) 2000007 12.386 0.000 33.266 0.000 parser.py:312(read_length_with_encoding) 1 0.000 0.000 0.000 0.000 parser.py:330(read_length) 2000006 11.267 0.000 51.153 0.000 parser.py:333(read_string) 1000003 7.220 0.000 141.652 0.000 parser.py:356(read_object) 1 0.000 0.000 0.000 0.000 parser.py:42(RdbCallback) 2 0.000 0.000 0.000 0.000 parser.py:466(read_intset) 1 0.000 0.000 0.000 0.000 parser.py:602(verify_magic_string) 1 0.000 0.000 0.000 0.000 parser.py:606(verify_version) 1 0.000 0.000 0.000 0.000 parser.py:611(init_filter) 2000006 9.695 0.000 14.738 0.000 parser.py:639(matches_filter) 1000003 1.779 0.000 1.779 0.000 parser.py:649(get_logical_type) 3000012 15.629 0.000 27.115 0.000 parser.py:710(read_unsigned_char) 6 0.000 0.000 0.000 0.000 parser.py:716(read_unsigned_short) 4 0.000 0.000 0.000 0.000 parser.py:722(read_unsigned_int) 1 0.000 0.000 0.000 0.000 parser.py:739(DebugCallback) 8 0.000 0.000 0.000 0.000 posixpath.py:59(join) 1 0.027 0.027 205.613 205.613 rdb:2() 1 0.001 0.001 205.288 205.288 rdb:8(main) 10 0.000 0.000 0.055 0.005 re.py:188(compile) 10 0.000 0.000 0.054 0.005 re.py:229(_compile) 19 0.000 0.000 0.032 0.002 sre_compile.py:184(_compile_charset) 19 0.001 0.000 0.031 0.002 sre_compile.py:213(_optimize_charset) 75 0.000 0.000 0.000 0.000 sre_compile.py:24(_identityfunction) 8 0.001 0.000 0.001 0.000 sre_compile.py:264(_mk_bitmap) 2 0.006 0.003 0.009 0.004 sre_compile.py:307(_optimize_unicode) 22 0.000 0.000 0.000 0.000 sre_compile.py:360(_simple) 10 0.000 0.000 0.007 0.001 sre_compile.py:367(_compile_info) 64/10 0.002 0.000 0.029 0.003 sre_compile.py:38(_compile) 20 0.000 0.000 0.000 0.000 sre_compile.py:480(isstring) 10 0.000 0.000 0.036 0.004 sre_compile.py:486(_code) 10 0.000 0.000 0.054 0.005 sre_compile.py:501(compile) 8 0.000 0.000 0.021 0.003 sre_compile.py:57(fixup) 104 0.000 0.000 0.003 0.000 sre_parse.py:132(len) 226 0.001 0.000 0.001 0.000 sre_parse.py:136(getitem) 22 0.000 0.000 0.000 0.000 sre_parse.py:140(setitem) 96 0.000 0.000 0.000 0.000 sre_parse.py:144(append) 81/32 0.001 0.000 0.001 0.000 sre_parse.py:146(getwidth) 10 0.000 0.000 0.000 0.000 sre_parse.py:184(init) 945 0.004 0.000 0.006 0.000 sre_parse.py:188(next) 209 0.001 0.000 0.001 0.000 sre_parse.py:201(match) 857 0.002 0.000 0.008 0.000 sre_parse.py:207(get) 69 0.000 0.000 0.000 0.000 sre_parse.py:216(isident) 13 0.000 0.000 0.000 0.000 sre_parse.py:222(isname) 12 0.000 0.000 0.000 0.000 sre_parse.py:231(_class_escape) 14 0.000 0.000 0.000 0.000 sre_parse.py:263(_escape) 33/10 0.000 0.000 0.018 0.002 sre_parse.py:307(_parse_sub) 38/10 0.003 0.000 0.017 0.002 sre_parse.py:385(_parse) 10 0.000 0.000 0.018 0.002 sre_parse.py:669(parse) 10 0.000 0.000 0.000 0.000 sre_parse.py:73(__init) 18 0.000 0.000 0.000 0.000 sre_parse.py:78(opengroup) 18 0.000 0.000 0.000 0.000 sre_parse.py:89(closegroup) 64 0.000 0.000 0.000 0.000 sre_parse.py:96(init) 1 0.001 0.001 0.006 0.006 threading.py:1() 2 0.000 0.000 0.000 0.000 threading.py:176(Condition) 1 0.000 0.000 0.000 0.000 threading.py:179(_Condition) 2 0.000 0.000 0.000 0.000 threading.py:181(init) 1 0.000 0.000 0.000 0.000 threading.py:221(_is_owned) 1 0.000 0.000 0.000 0.000 threading.py:272(notify) 1 0.000 0.000 0.000 0.000 threading.py:290(notifyAll) 1 0.000 0.000 0.000 0.000 threading.py:299(_Semaphore) 1 0.000 0.000 0.000 0.000 threading.py:347(_BoundedSemaphore) 1 0.000 0.000 0.000 0.000 threading.py:359(Event) 1 0.000 0.000 0.000 0.000 threading.py:362(_Event) 1 0.000 0.000 0.000 0.000 threading.py:366(init) 1 0.000 0.000 0.000 0.000 threading.py:376(set) 1 0.000 0.000 0.000 0.000 threading.py:414(Thread) 1 0.000 0.000 0.000 0.000 threading.py:426(init) 1 0.000 0.000 0.000 0.000 threading.py:510(_set_ident) 1 0.000 0.000 0.000 0.000 threading.py:57(_Verbose) 4 0.000 0.000 0.000 0.000 threading.py:59(init) 1 0.000 0.000 0.000 0.000 threading.py:64(_note) 1 0.000 0.000 0.000 0.000 threading.py:713(_Timer) 1 0.000 0.000 0.000 0.000 threading.py:742(_MainThread) 1 0.000 0.000 0.000 0.000 threading.py:744(init) 1 0.000 0.000 0.000 0.000 threading.py:752(_set_daemon) 1 0.000 0.000 0.000 0.000 threading.py:783(_DummyThread) 1 0.000 0.000 0.000 0.000 threading.py:99(_RLock) 1 0.000 0.000 0.000 0.000 traceback.py:1() 1 0.000 0.000 0.001 0.001 warnings.py:45(filterwarnings) 1 0.012 0.012 0.013 0.013 {import} 10 0.000 0.000 0.000 0.000 {_sre.compile} 35 0.021 0.001 0.021 0.001 {_sre.getlower} 3000023 4.948 0.000 4.948 0.000 {_struct.unpack} 3 0.000 0.000 0.000 0.000 {abs} 4 0.000 0.000 0.000 0.000 {all} 1 0.000 0.000 0.000 0.000 {binascii.a2b_hex} 26 0.001 0.000 0.001 0.000 {built-in method new of type object at 0x82e5e0} 3 0.000 0.000 0.000 0.000 {built-in method acquire} 10 0.000 0.000 0.000 0.000 {built-in method group} 1000006 3.285 0.000 3.285 0.000 {built-in method match} 2 0.000 0.000 0.000 0.000 {built-in method release} 1000003 2.831 0.000 2.831 0.000 {built-in method search} 1000003 5.982 0.000 5.982 0.000 {built-in method sub} 32 0.000 0.000 0.000 0.000 {chr} 1 0.015 0.015 205.628 205.628 {execfile} 6 0.000 0.000 0.000 0.000 {filter} 400 0.001 0.000 0.001 0.000 {getattr} 8 0.000 0.000 0.000 0.000 {globals} 4 0.000 0.000 0.000 0.000 {hasattr} 3000302 5.241 0.000 5.241 0.000 {isinstance} 20/11 0.000 0.000 0.000 0.000 {issubclass} 2002286/2002258 3.008 0.000 3.008 0.000 {len} 4 0.000 0.000 0.000 0.000 {locals} 2 0.000 0.000 0.000 0.000 {map} 7 0.000 0.000 0.000 0.000 {max} 4 0.000 0.000 0.000 0.000 {method 'contains' of 'frozenset' objects} 2 0.000 0.000 0.000 0.000 {method 'enter' of 'file' objects} 9 0.000 0.000 0.000 0.000 {method 'subclasses' of 'type' objects} 9 0.000 0.000 0.000 0.000 {method 'subclasshook' of 'object' objects} 73 0.000 0.000 0.000 0.000 {method 'add' of 'set' objects} 2000814 3.395 0.000 3.395 0.000 {method 'append' of 'list' objects} 1 0.000 0.000 0.000 0.000 {method 'copy' of 'dict' objects} 1 0.000 0.000 0.013 0.013 {method 'decode' of 'str' objects} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 24 0.000 0.000 0.000 0.000 {method 'endswith' of 'str' objects} 8 0.000 0.000 0.000 0.000 {method 'extend' of 'list' objects} 9 0.000 0.000 0.000 0.000 {method 'find' of 'str' objects} 85 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects} 1 0.000 0.000 0.000 0.000 {method 'insert' of 'list' objects} 30 0.000 0.000 0.000 0.000 {method 'isalnum' of 'str' objects} 4 0.000 0.000 0.000 0.000 {method 'isdigit' of 'str' objects} 33 0.000 0.000 0.000 0.000 {method 'items' of 'dict' objects} 3 0.000 0.000 0.000 0.000 {method 'join' of 'str' objects} 1 0.000 0.000 0.000 0.000 {method 'keys' of 'dictproxy' objects} 4 0.000 0.000 0.000 0.000 {method 'lower' of 'str' objects} 1 0.000 0.000 0.000 0.000 {method 'lstrip' of 'str' objects} 4 0.000 0.000 0.000 0.000 {method 'pop' of 'list' objects} 5000020 13.233 0.000 13.233 0.000 {method 'read' of 'file' objects} 18 0.000 0.000 0.000 0.000 {method 'remove' of 'list' objects} 8 0.000 0.000 0.000 0.000 {method 'replace' of 'str' objects} 3 0.000 0.000 0.000 0.000 {method 'reverse' of 'list' objects} 544 0.002 0.000 0.002 0.000 {method 'setdefault' of 'dict' objects} 2 0.000 0.000 0.000 0.000 {method 'setter' of 'property' objects} 6 0.000 0.000 0.000 0.000 {method 'split' of 'str' objects} 156 0.000 0.000 0.000 0.000 {method 'startswith' of 'str' objects} 3 0.000 0.000 0.000 0.000 {method 'strip' of 'str' objects} 2 0.000 0.000 0.000 0.000 {method 'tolist' of 'array.array' objects} 2 0.000 0.000 0.000 0.000 {method 'tostring' of 'array.array' objects} 1 0.000 0.000 0.000 0.000 {method 'translate' of 'str' objects} 8 0.000 0.000 0.000 0.000 {method 'upper' of 'str' objects} 2000006 5.662 0.000 5.662 0.000 {method 'write' of 'file' objects} 135 0.000 0.000 0.000 0.000 {min} 2 0.139 0.069 0.139 0.069 {open} 68 0.000 0.000 0.000 0.000 {ord} 8 0.000 0.000 0.000 0.000 {posix.stat} 9 0.000 0.000 0.000 0.000 {range} 1 0.000 0.000 0.000 0.000 {repr} 109 0.000 0.000 0.000 0.000 {setattr} 1 0.000 0.000 0.000 0.000 {sys._getframe} 3 0.000 0.000 0.000 0.000 {thread.allocate_lock} 2 0.000 0.000 0.000 0.000 {thread.get_ident}

yoav-steinberg commented 12 years ago

Did some profiling and a quick patch with: https://github.com/teepark/python-lzf resulting in x2 performance boost.

sripathikrishnan commented 12 years ago

@yoav-steinberg : Thanks for taking time to investigate this issue!

I don't like adding a dependency to the project. Let me investigate if there is a way to conditionally include the library, so that people who don't want to install the dependency can still use rdb-tools.

yoav-steinberg commented 12 years ago

You can also consider including the c files from the liblzf directly in redis-rdb-tools instead of adding a dependency. It is fairly common to have liblzf files included inside a larger project (actually redis does this!).

jvtm commented 9 years ago

Simplistic patch here: https://github.com/jvtm/redis-rdb-tools/tree/lzf-speedup

Not creating a pull request just yet, I want to test this with real fresh dumps first. The related unit tests pass, but I didn't check if error reporting on invalid values behaves the same.

joshowen commented 8 years ago

@sripathikrishnan any thoughts on including @jvtm's patch? It doesn't require python-lzf, but uses it if its there.

billcrook commented 4 years ago

bump. Any chance on getting this? Parsing a 10g backup for me is brutal.

jvtm commented 4 years ago

Wow, didn't even remember this one... Not working anymore on the project where this was required. Here's the exact tiny commit: https://github.com/jvtm/redis-rdb-tools/commit/fdd8134bed488462d0bfae449b542bb3d611f7d3 (failed to include this issue in commit message)

billcrook commented 4 years ago

I dug around the code and noticed this commit introduced the lzf optimization.

oranagra commented 4 years ago

@billcrook the current code only uses the lzf optimization if you have the native library installed. maybe you just need to do pip install lzf?

does the commit @jvtm mentioned changes anything? seem to me that it does the same thing the current version already does. please let me know if i'm missing anything.

billcrook commented 4 years ago

@billcrook the current code only uses the lzf optimization if you have the native library installed. maybe you just need to do pip install lzf?

You mean python-lzf, right?

does the commit @jvtm mentioned changes anything? seem to me that it does the same thing the current version already does. please let me know if i'm missing anything.

You are correct. It seems to do the same check for existence of lzf module.

oranagra commented 4 years ago

@billcrook no, not python-lzf that's the python re-implementation. the fast one, which we rather use is just lzf which are python bindings to the C implementation.

billcrook commented 4 years ago

Are you sure about that? When I remove python-lzf and install lzf I get:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 13, in parse
  File "/Users/billcrook/dev/audiomack/data-pipeline/venv/lib/python3.6/site-packages/rdbtools/parser.py", line 461, in parse_fd
    self.read_object(f, data_type)
  File "/Users/billcrook/dev/audiomack/data-pipeline/venv/lib/python3.6/site-packages/rdbtools/parser.py", line 569, in read_object
    value = self.read_string(f)
  File "/Users/billcrook/dev/audiomack/data-pipeline/venv/lib/python3.6/site-packages/rdbtools/parser.py", line 508, in read_string
    val = self.lzf_decompress(f.read(clen), l)
  File "/Users/billcrook/dev/audiomack/data-pipeline/venv/lib/python3.6/site-packages/rdbtools/parser.py", line 1021, in lzf_decompress
    return lzf.decompress(compressed, expected_length)
AttributeError: module 'lzf' has no attribute 'decompress
billcrook commented 4 years ago

For reference: https://github.com/sripathikrishnan/redis-rdb-tools/pull/110#issue-160349318

oranagra commented 4 years ago

@billcrook sorry, it seems that i was wrong.. python-lzf is the one that's native, and redis-rdb-tools has no use of the lzf library. maybe @galcohen-redislabs can provide some insight or spot a regression.

galcohen-redislabs commented 4 years ago

@billcrook Please provide some rough numbers on the rdb file: Number of keys, average value size, time it takes to rdb --command json it.