lyst / lightfm

A Python implementation of LightFM, a hybrid recommendation algorithm.
Apache License 2.0
4.77k stars 691 forks source link

very huge matrix cause crash #586

Closed mrtztg closed 3 years ago

mrtztg commented 3 years ago

I'm relatively new to Python and recommendation systems. I'm implementing LightFM by following thing guide: I want to implement a recommendation system in python with this tutorial:

Solving business usecases by recommender system using lightFM But when I run the project, it crashes because of memory limit: MemoryError: Unable to allocate 71.5 GiB for an array with shape (162541, 59047) and data type float64

I know that this is because of dataFrame size (100k rows, 25M columns). the code that generates this dataFrame:


def create_interaction_matrix(df, user_col, item_col, rating_col, norm=False, threshold=None):
'''
Function to create an interaction matrix dataframe from transactional type interactions
Required Input -
    - df = Pandas DataFrame containing user-item interactions
    - user_col = column name containing user's identifier
    - item_col = column name containing item's identifier
    - rating col = column name containing user feedback on interaction with a given item
    - norm (optional) = True if a normalization of ratings is needed
    - threshold (required if norm = True) = value above which the rating is favorable
Expected output - 
    - Pandas dataframe with user-item interactions ready to be fed in a recommendation algorithm
'''
interactions = df.groupby([user_col, item_col])[rating_col] \
    .sum().unstack().reset_index(). \
    fillna(0).set_index(user_col)
if norm:
    interactions = interactions.applymap(lambda x: 1 if x > threshold else 0)
return interactions

But I have no idea to solve it.

mrtztg commented 3 years ago

Also, I posted my problem in Stackoverflow. A guy mentioned using sparse matrix or sparse dataframe. But I have no idea how to do that.

SimonCW commented 3 years ago

Hi @mrtztg , I suggest you use the built-in functionality for creating the sparse matrices: https://making.lyst.com/lightfm/docs/lightfm.data.html#lightfm.data.Dataset

This example might be a good starting point: https://making.lyst.com/lightfm/docs/examples/dataset.html

mrtztg commented 3 years ago

Thanks. @SimonCW I'm currently trying to use this function since yesterday. Now I have another issue: Majority of the top N recommended items for users are mostly similar

SimonCW commented 3 years ago

Hi @mrtztg if this problem is solved please close it.