practical-recommender-systems / moviegeek

A django website used in the book Practical Recommender Systems to illustrate how recommender algorithms can be implemented.
MIT License
901 stars 360 forks source link

Explicit zeros get ignored when calculating the overlap matrix. #74

Open ksiar137 opened 6 months ago

ksiar137 commented 6 months ago

In the following line, an overlap matrix is created by converting the coo matrix to boolean, then to integer. https://github.com/practical-recommender-systems/moviegeek/blob/d02d797f38abdee95eed2918debb1de3bdf35ed1/builder/item_similarity_calculator.py#L52

However, what this does is that it converts the ratings which are normalized to zero, to false values, which then get ignored in the count. My proposed solution: create a matrix with ones for every value of the coo matrix:

Example:

print("Coo matrix:\n", coo) print("coo as bool:\n",coo.astype(bool).astype(int)) ones_data = [1] * len(coo.data) ones_matrix = coo_matrix((ones_data, (coo.row, coo.col)), shape=coo.shape) print("ones matrix:\n",ones_matrix)

Output:

Coo matrix: (0, 0) -0.6666666666666667 (1, 0) 0.33333333333333326 (2, 0) 0.33333333333333326 (1, 1) 0.5 (2, 1) 0.0 (3, 1) -0.5 (1, 2) 0.0 (2, 2) 0.5 (3, 2) -0.5 coo as bool: (0, 0) 1 (1, 0) 1 (1, 1) 1 (1, 2) 0 (2, 0) 1 (2, 1) 0 (2, 2) 1 (3, 1) 1 (3, 2) 1 ones matrix: (0, 0) 1 (1, 0) 1 (2, 0) 1 (1, 1) 1 (2, 1) 1 (3, 1) 1 (1, 2) 1 (2, 2) 1 (3, 2) 1