Closed kharazi closed 10 years ago
In dense representation one vector on that space has 16384 coordinates. So 239 vectors take 3915776 coordinates. When using double (mostly 8 bytes) this results in about 31+ MB. But this is just the raw data in memory.
The redis layer however stores vectors and their associated data (most of the time just a string) as JSON strings! So this adds a lot of overhead to this and is very likely the reason for your observation.
Are your vectors sparse? If so, and most values are zero anyway, then I should soon finish a branch that has some additions to use sparse vectors. The redis footprint is with the sparse representation of course much smaller.
thanks for your description. yes, most values are zero in my vectors. Can you say When do you publish it?
Will take a look into the branch to tell....
I added support for sparse vectors to processing, hashes and redis storage. Check the last three commits.
For calculations (dot product / projections) NearPy uses CSR format now. For storing into redis it uses COO format. See http://docs.scipy.org/doc/scipy/reference/sparse.html for further details.
So if your vectors are sparse you should use actual sparse vectors/matrices now, as supported by scipy.
Your sparse vectors have to have a shape of (n, 1), where n is your feature space dimension. If this is given, you can use NearPy as usual.
Random sparse vectors for example can be generated like this: scipy.sparse.rand(30, 1, density=0.3), so shape in this case is (30, 1).
Tnx, it's very helpful for me. I had trouble with memory:)
Hi, I have a problem with memory usage, my vector dim is 2**14, i know it's very large but in #3 you say about turning vectors to sparse matrix this is my redis memory usage when my vectors count is 239:
Why it's Use about 169M of ram? is it normal for my data size?