mikepound / opencubes

A community improved version of the polycubes project!
MIT License
45 stars 23 forks source link

On the use of sparse Matrices #16

Closed Recouer closed 1 year ago

Recouer commented 1 year ago

i believe using sparse matrices rather than using 3D matrices should reduce the amount of stored data by a fair amount.

VladimirFokow commented 1 year ago

I don't think the matrices that we have are sparce enough to have a dramatic improvement. But really, I don't know. Of course, testing is needed to compare both approaches.

https://stackoverflow.com/a/36971131

such a matrix has to store 3 arrays of values (at least in the coo format). So the sparsity has to be less than 1/3 to start saving memory

Recouer commented 1 year ago

I've looked at the sparsity of the matrices for up to n=10 and i get interesting results:

so when looking at it like that, we could think that we would be able to reduce our memory usage if we used coo for n=10 and onward by a fair amount.

However, that's not taking into account that we can compress the data in the tensor as bit sized value array and reduce its memory usage by 8 (it's actually what's done in the packing librairy. and each value of the sparse matrix would need at least 16 bits in size if we did the same compression (since we can consider that the index won't go above 32, we can simply use 5bits to represent one index so each value would require 3*5 bits, rounded to 16 for convenience) hence the minimum sparsity of the matrix would be to be at least (1/16) to be interesting.

to see if we could ever hope to find a matrix with such sparsity, we can take the worst case scenario for n=16, which is a polycube with one cube at the center and 3 branches each in 3 different direction, each of size 5. the total size of the tensor would be 666 = 216, and its sparsity, 16/216, which is slightly over 1/16.

Hence, even in the best case scenario for sparse matrices, we can see that the tensor implementation still requires less space.

VladimirFokow commented 1 year ago

Great analysis!!

bertie2 commented 1 year ago

good to know! the C++ implementation already uses a sparse matrix, though that was because it made the rotation easier to understand by using conventional rotation matrices, but its great to know that that is likely optimal memory wise for large N as well.

VladimirFokow commented 1 year ago

(@bertie2 I think what @Recouer is saying is that the sparse matrix is not optimal memory-wise.)

But maybe the C++ code implements it differently than numpy, so maybe it would actually be optimal there.

bertie2 commented 1 year ago

@VladimirFokow no you are right I just completely misread, in which case changing the C++ version to work with bit arrays is back on the todo list.