use .sum() method to also support sparse matrices

hpclab / rankeval

Official repository of RankEval: An Evaluation and Analysis Framework for Learning-to-Rank Solutions.

http://rankeval.isti.cnr.it/

Mozilla Public License 2.0

88 stars 11 forks source link

use .sum() method to also support sparse matrices #17

Closed zenogantner closed 5 years ago

zenogantner commented 5 years ago

When using a sparse matrix, e.g. scipy.sparse.csr_matrix, we otherwise get the error message: "NotImplementedError: adding a nonzero scalar to a sparse matrix is not supported". With the .sum(), method, it works for both sparse and dense (numpy array) matrices.

This little change allows us to handle much bigger sparse datasets. Memory saving depends on the dataset, I observed a factor of 7 for a dataset with about 5% density.

strani commented 5 years ago

PR accepted, thank you Zeno. BTW, computing the hash on a small portion of the data is all but clean solution (I'm criticizing myself... :) )

zenogantner commented 5 years ago

Yup, and maybe even looking just at the subset, a checksum like MD5 may make more sense than a simple sum: Right now, the part of the function operating on the label gives the same result for all possible combinations of the same label frequency ...