more compact serialization; fix dtype issue

pixelogik / NearPy

Python framework for fast (approximated) nearest neighbour search in large, high-dimensional data sets using different locality-sensitive hashes.

MIT License

759 stars 152 forks source link

more compact serialization; fix dtype issue #29

Closed wanji closed 9 years ago

wanji commented 9 years ago

Serialization for Redis is requires much more memory than the in-memory way. A 4096-D single precision float vector requires 83KB memory, while the useful data only requires 16KB memory. I have replaced tostring() with tolist(), and replaced json.dumps with cPickle.dumps. The amount of required memory reduced to about 16KB.
While rebuilding numpy.array from Redis, float64 is used as dtype, which causes errors for other data type. I recored the dtype field of numpy.array in var_dict, and used it for rebuilding numpy.array

pixelogik commented 9 years ago

Thx @wanji , I will give it a look asap and merge it. Sounds good! :)

pixelogik commented 9 years ago

hi @wanji thanks again for the cool contribution.

python run_tests.py fails saying:

ERROR: test_redis_storage (nearpy.tests.storage_tests.TestStorage)

Traceback (most recent call last): File "/Users/ole/Development/misc/NearPy/nearpy/tests/storage_tests.py", line 66, in test_redis_storage X = self.redis_storage.get_bucket('testHash', bucket_key) File "/Users/ole/Development/misc/NearPy/nearpy/storage/storage_redis.py", line 100, in get_bucket val_dict = cPickle.loads(want_string(item_str)) File "/Users/ole/Development/misc/NearPy/nearpy/utils/utils.py", line 66, in want_string rv = arg.decode(encoding) File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode byte 0x80 in position 0: invalid start byte

could you fix that?

i just merged develop into master, so it would be best if you work on the new master code base.

wanji commented 9 years ago

@pixelogik Sure, will update the master code and check again :)

wanji commented 9 years ago

@pixelogik This error is due to decoding a pickled string as utf-8. After remove want_string from val_dict = cPickle.loads(want_string(item_str)), the code can pass the tests. Is this acceptable?

pixelogik commented 9 years ago

Alright thx, I removed that call. Tests run successful.

wanji commented 9 years ago

Thanks :)

Best regards,

Wan Ji

Date: Thu, 6 Nov 2014 03:45:37 -0800 From: notifications@github.com To: NearPy@noreply.github.com CC: wanji@live.com Subject: Re: [NearPy] more compact serialization; fix dtype issue (#29)

Alright thx, I removed that call. Tests run successful.

¡ª Reply to this email directly or view it on GitHub.