Open junkangwu opened 3 years ago
I have the same doubt. I am not sure why index
is used instead of the actual uid
?
Hi,
I agree with you and I think its a bug in the code. Initially, I wasn't able to run the code and thought it was probably some data issue, and I went back to change the code as follows.
In preprocess.py
def filter_triplets(tp, min_uc=min_uc, min_sc=min_sc):
if min_sc > 0:
itemcount = get_count(tp, 'movieId')
tp = tp[tp['movieId'].isin(itemcount[itemcount >= min_sc].movieId)]
# tp = tp[tp['movieId'].isin(itemcount.index[itemcount >= min_sc])]
if min_uc > 0:
usercount = get_count(tp, 'userId')
tp = tp[tp['userId'].isin(usercount[usercount >= min_uc].userId)]
# tp = tp[tp['userId'].isin(usercount.index[usercount >= min_uc])]
usercount, itemcount = get_count(tp, 'userId').set_index('userId'), get_count(tp, 'movieId').set_index('movieId')
thanks for your advice~
Hi, nice work about Variational Autoencoder on recommendation. However, I am confused about the method of data split which is the same way as 2018WWW-Variational autoencoders for collaborative filtering In the https://github.com/ilya-shenbin/RecVAE/blob/8b9b2ded3f215f9e30b45a9cc61199b67fc3da42/preprocessing.py#L60
unique_uid
is the index of active user rather than theuid
(unique_uid['userId']
). Owing to the filter operator before, someuserId
are moved out. Then some validuserId
at the end will not be considered if we adopt the index ofuser_activity
rather than the actualuid
. I guess it might be a error or is there any other meaning of that?Looking forward to your reply, Thanks. Best.