Open mbarbry opened 4 years ago
Hello, you can pass the sorted=True
flag to avoid the sorting of contents, and the has_duplicates=False
to avoid deduplication. Beware that there will be issues if you do this with coordinates which aren’t sorted or have duplicates.
Also, the higher memory usage is due to the format. We use COO which usually has lower compression efficiency than CSR, except hypersparse arrays.
I think the main factor here is that np.intp
typically upcasts 32 bit ints to 64 bit ints. @mbarbry, if you run your code example and check the dtypes of the coordinate arrays I think you'll see A.row.dtype
is going to be dtype('int32')
whereas B.coords.dtype
is dtype('int64')
. This is related to #249.
Thank you for your answers. What @daletovar describes seems to be the issue. So from what I read in #249 , there is no actual fix for such situations?
I don't know what kind of bugs occurred exactly when using other dtypes to store coordinates (@hameerabbasi might be able to answer this), but you could perhaps try commenting out the conversion. Depending on what you're trying to do the GCXS format could be useful. You would have to clone from github to use it.
Yes, it was complex. We had overflows, and lots of them in different places. 🤷♂️ I gave up at some point and moved to np.int64
.
Dear developers,
Description In my code, I'm using
sparse
for handling large data ( > 10 GB). I noticed a larger memory usage by thesparse
library than I expected. Comparing 2D matrix withscipy.sparse
I realized thatsparse
is using a significantly larger amount of memory thanscipy.sparse
. Below you can find the memory consumption of the small example code included at the bottom (obtained with thememory_profiler
library)We see a usage of 220 MB by
sparse.COO
whilescipy.sparse
uses only 127 MB. Investigating the memory usage insparse.COO
, I found a large amount of memory used by the linesand
If I comment line 246 in the file
sparse/_coo/core.py
then the memory usage is significantly smaller.A gain of around 60 MB. My question is, why line 246 in
sparse/_coo/core.py
seems to copy the memory, whilecopy=False
and how can I avoid it? Also, do there is a way to avoid the sorting of index in line 276 when converting the matrix fromscipy.sparse
?Example Code