noisebridge / MediaBridge

2 stars 2 forks source link

Create a sparse matrix (interactions matrix) of user -> movie rating from the Netflix data set #6

Open audiodude opened 2 months ago

audiodude commented 2 months ago

LightFM requires a matrix, where the rows are the users and the columns are the movies. A '1' in a cell represents that the user liked that movie.

audiodude commented 2 months ago

We should consider whether we can or need to serialize this matrix, or if we can just recreate it each time.

audiodude commented 2 months ago

Basic algorithm:

Finally: save the matrix to disk (pickle)

Some sample code:

n = 0
if id_ not in remap:
  remap[id_] = n
  n += 1

data[remap[id_]][movie_id] = 5
audiodude commented 1 month ago

As discussed a couple of weeks ago, the interactions matrix needs to be a coo_matrix: https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.coo_matrix.html