ocelma / python-recsys

A python library for implementing a recommender system
1.48k stars 438 forks source link

KeyError in svd.recommend() #12

Closed igor87z closed 9 years ago

igor87z commented 9 years ago

I have a dat-file with 3705912 lines, matrix 602x6156. When I call svd.recommend for identifiers have little values I get KeyError.

svd.get_matrix().get_col(50652)

SparseVector (4 of 602 entries): [6840789=14, 6843925=100, 6843926=100, 6843927=16]

svd.recommend(50652, is_row=False)

Traceback (most recent call last):
  File "svd.py", line 251, in <module>
    print svd.recommend(50652, is_row=False)
  File "/home/igor/sandbox/svd/local/lib/python2.7/site-packages/recsys/algorithm/factorize.py", line 352, in recommend
    item = self._get_col_reconstructed(i, zeros)
  File "/home/igor/sandbox/svd/local/lib/python2.7/site-packages/recsys/algorithm/factorize.py", line 300, in _get_col_reconstructed
    return self._matrix_reconstructed.col_named(j)
  File "/home/igor/sandbox/svd/local/lib/python2.7/site-packages/divisi2/labels.py", line 65, in col_named
    return self[:,self.col_index(label)]
  File "/home/igor/sandbox/svd/local/lib/python2.7/site-packages/divisi2/labels.py", line 57, in col_index
    return self.col_labels.index(label)
KeyError: 50652

But for identifiers have many values

svd.get_matrix().get_col(10536)

SparseVector (22 of 602 entries): [6840778=96, 6840779=100, 6840780=100, 6840781=100, 6840782=100, 6840783=100, 6840784=100, 6840785=100, 6840786=65, 6840818=83, 6840819=100, 6840820=100, 6840821=100, 6840822=100, 6840823=100, 6840824=100, 6840825=100, 6840826=100, 6840827=100, 6840828=21, ...]

svd.recommend(10536, is_row=False)

[(6900161, 100.00000000232269), (6840819, 100.00000000214945), (6840822, 100.00000000186564), (6840821, 100.00000000178625), (6840820, 100.0000000016603), (6840783, 100.00000000144556), (6840827, 100.00000000137024), (6840826, 100.00000000134551), (6840784, 100.00000000123218), (6840825, 100.00000000112296)]
ocelma commented 9 years ago

I probably will need more info, but for now my can you please check what value did you set for min_values param when computing the SVD?

compute(self, k=100, min_values=???
igor87z commented 9 years ago

My mistake. I forget change value of min_values from example. Set to 1 and everything works. Now I have one question. For cells that doesn't have a values in dat-file I need to add lines without values, only row and column identifiers, or such lines in the file should not be. Sorry, but I don't found example of dat-file.

igor87z commented 9 years ago

And I get in recommendations rated items: In dat-file:

5156::6877715::37
5156::6877716::100
5156::6877717::78
5156::6878180::43
5156::6935490::73
5156::6935491::71
5156::6942420::91
5156::6942421::29
5156::6942597::74
5156::6942598::100
5156::6942599::100
5156::6942600::55

6877716, 6877715, 6877717 exists in recommend:

svd.recommend(5156, is_row=False)

[(6877716, 25.352191096345248), (6877715, 22.913767557647038), (6877717, 21.127898465135296), (6877718, 16.982821565501425), (6877714, 16.571863381535454), (6877720, 14.582520866868133), (6877719, 14.267469649710268), (6877713, 11.796344566506022), (6914703, 9.8264285723669271), (6914704, 9.3989017317142203)]

In dat-file if some items not rated by user such line doesn't exist, but if add lines like "::::" it doesn't matter these options are still present in a recommendations.

igor87z commented 9 years ago

And if some item not rated by any user, how add these if not rated in the file should not be.

ocelma commented 9 years ago

I don't quite get your comments above.

My advice here is that if you have items with only one (or zero) users and you want to include them in the recommendations, the results won't be good at all. You need more data to provide decent recommendations. That's why min_values is set to 10 by default. Maybe you could go as low as 5, but lower than that, the results from SVD will be bad.