ing-bank / sparse_dot_topn

Python package to accelerate the sparse matrix multiplication and top-n similarity selection
Apache License 2.0
399 stars 86 forks source link

ENH: new function zip_sp_matmul_topn can zip matrices zip_j A.dot(B_j) #101

Closed mbaak closed 8 months ago

mbaak commented 8 months ago

ENH: new function zip_sp_matmul_topn that can zip matrices zip_j A.dot(B_j)

Function will return a zipped matrix Z in CSR format, zip_j C_j, where Z = [sorted top n results > lower_bound for each row of C_j], where C_j = A.dot(B_j) and where B has been split row-wise into sub-matrices B_j.

Function only allows for sorted variant of sp_matzip function; unsorted variant (sorted based on insertion order) cannot be (made) equal to unsorted function on full matrices. zip_sp_matmul_topn by default sorts by value.

And added python function to zip split matrices. Plus added two unit tests to test functionality.

NB Skip unit test test_stack_zip_sp_matmul_topn for python 3.8 due to bug in scipy vstack function, it does not support all data types.

Bump version to 1.1.0

RUrlus commented 8 months ago

Otherwise looks good Max, thanks!

RUrlus commented 8 months ago

Ha Max, can you add the change to the change log as well? Picked this up