pola-rs / valves

general functions for your data .pipe()-lines.
16 stars 2 forks source link

user_item and item_item recommender tables #12

Open koaning opened 2 years ago

koaning commented 2 years ago

Given a log of weighted user-item interactions, can we generate a item-item recommendation table and a user-item recommendation table?

Kind of! We can calculate p(item_a | item_b) and p(item_a) which is can be reweighed into a table with recommendations. We can also do something similar for users. After all, a user that interactive with items a, b and c will have a score for item x defined via;

p(item_x | user) = p(item_x | item_a, item_b, item_c)
                 \propto p(item_x | item_a) p(item_x| item_b) p(item_x|item_c)
ritchie46 commented 2 years ago

Interesting.. Would every cell in one table need to be computed with all others?

koaning commented 2 years ago

I don't think so unless every user has interacted with every item.

I've started with a item-item count table though.

def item_item_counts(dataf, user_col="user", item_col="item"):
    Computers item-item overlap counts from user-item interactions, useful for recommendations.

    This function is meant to be used in a `.pipe()`-line.

        - dataf: polars dataframe
        - user_col: name of the column containing the user id
        - item_col: name of the column containing the item id
    return (dataf
        .filter(pl.col(item_col) != pl.col("item_rec"))
            pl.col(user_col).count().over([pl.col(item_col), 'item_rec']).alias("n_both")
        .select(['item', 'item_rec', 'n_item', 'n_item_rec', 'n_both'])

Something is telling me these kinds of queries are gonna benchmark reaaaal well.

koaning commented 2 years ago


It's something like this;

result = (df
  .filter(pl.col("item") != pl.col("item_rec"))
    pl.col('user').count().over(['item', 'item_rec']).alias("n_both")

  .filter(pl.col('n_both') > 10)
  .sort(['item', 'rating'], reverse=True))
koaning commented 2 years ago

@ritchie46 does polars support log?