sberbank-ai-lab / RePlay

RecSys Library
https://sberbank-ai-lab.github.io/RePlay/
Apache License 2.0
68 stars 6 forks source link

`DataPreparator` bug when using user_features and/or item_features #101

Closed shashist closed 2 years ago

shashist commented 2 years ago

DataPreparator fails to use Indexer.(user|item)_indexer when calling with user_features and/or item_features. Because Indexer.(user|item)_indexer expects calling Indexer.fit() before Indexer.transfrom().

How to reproduce

import pandas as pd

from replay.data_preparator import DataPreparator

df = pd.read_csv(
    "experiments/data/ml1m_ratings.dat", 
    sep="\t", 
    names=["user_id", "item_id", "relevance", "timestamp"]
)
users = pd.read_csv(
    "experiments/data/ml1m_users.dat", 
    sep="\t", 
    names=["user_id", "gender", "age", "occupation", "zip_code"]
)

data_preparator = DataPreparator()
log, user_features, _ = data_preparator(df, users)

Expected behavior

Probably Indexer.fit() should be used earlier in DataPreparator.__call__()