ibayer / fastFM

fastFM: A Library for Factorization Machines
http://ibayer.github.io/fastFM
Other
1.08k stars 204 forks source link

need help in creating the X and y dataset for fastFM rating prediction #128

Open chrisbangun opened 6 years ago

chrisbangun commented 6 years ago

Hi,

am I doing things correctly here while building the dataset that valid for fastFM?

So basically, I have a dataframe containing my user-item interaction, along with the context/features and the labels. I then split this dataframe into two: 1) X which contains my user-item interaction along with the features, and 2) y which is the rating.

I then convert my dataframe X into python dictionary and then use sklearn Dictvectorizer in order to create the scipy sparse matrix. I then feed it to the fastFM model. here are the code example:

X_train = train_interaction[['profile_id_encoded', 'item_id_encoded',
                            'popularity_score', 'is_last_interaction']]

y_train = train_interaction['ratings'].values.squeeze()

X_val = val_interaction[['profile_id_encoded', 'item_id_encoded',
                            'popularity_score', 'is_last_interaction']]
y_val = val_interaction['ratings'].values.squeeze()

# X_train and X_val are dataframe while y_train and y_val are now np.array

X_train_dicts = X_train.to_dict('records')
X_val_dicts = X_val.to_dict('records')

from sklearn.feature_extraction import DictVectorizer
import scipy.sparse as sp

vec = DictVectorizer()
vectorizer = vec.fit_transform(X_train_dicts)

#below i convert the csr matrix into csc_matrix
fm_X_train = sp.csc_matrix(vectorizer)

fm = als.FMRegression(n_iter=10000, init_stdev=0.1, l2_reg_w=0, l2_reg_V=0, rank=5)

fm.fit(fm_X_train, y_train)

# prepare for prediction
vec = DictVectorizer()
vectorizer = vec.fit_transform(X_val_dicts)
fm_X_val = sp.csc_matrix(vectorizer)

y_pred = fm.predict(fm_X_val)

print(mean_squared_error(y_pred, y_val)) 

the MSE is bad tho: 93%

did I do things correctly here? really appreciate any help, thank you