am I doing things correctly here while building the dataset that valid for fastFM?
So basically, I have a dataframe containing my user-item interaction, along with the context/features and the labels. I then split this dataframe into two: 1) X which contains my user-item interaction along with the features, and 2) y which is the rating.
I then convert my dataframe X into python dictionary and then use sklearn Dictvectorizer in order to create the scipy sparse matrix. I then feed it to the fastFM model. here are the code example:
X_train = train_interaction[['profile_id_encoded', 'item_id_encoded',
'popularity_score', 'is_last_interaction']]
y_train = train_interaction['ratings'].values.squeeze()
X_val = val_interaction[['profile_id_encoded', 'item_id_encoded',
'popularity_score', 'is_last_interaction']]
y_val = val_interaction['ratings'].values.squeeze()
# X_train and X_val are dataframe while y_train and y_val are now np.array
X_train_dicts = X_train.to_dict('records')
X_val_dicts = X_val.to_dict('records')
from sklearn.feature_extraction import DictVectorizer
import scipy.sparse as sp
vec = DictVectorizer()
vectorizer = vec.fit_transform(X_train_dicts)
#below i convert the csr matrix into csc_matrix
fm_X_train = sp.csc_matrix(vectorizer)
fm = als.FMRegression(n_iter=10000, init_stdev=0.1, l2_reg_w=0, l2_reg_V=0, rank=5)
fm.fit(fm_X_train, y_train)
# prepare for prediction
vec = DictVectorizer()
vectorizer = vec.fit_transform(X_val_dicts)
fm_X_val = sp.csc_matrix(vectorizer)
y_pred = fm.predict(fm_X_val)
print(mean_squared_error(y_pred, y_val))
the MSE is bad tho: 93%
did I do things correctly here? really appreciate any help, thank you
Hi,
am I doing things correctly here while building the dataset that valid for fastFM?
So basically, I have a dataframe containing my user-item interaction, along with the context/features and the labels. I then split this dataframe into two: 1)
X
which contains my user-item interaction along with the features, and 2)y
which is the rating.I then convert my dataframe
X
into python dictionary and then usesklearn Dictvectorizer
in order to create the scipy sparse matrix. I then feed it to the fastFM model. here are the code example:the MSE is bad tho: 93%
did I do things correctly here? really appreciate any help, thank you