ilya-shenbin / RecVAE

The official PyTorch implementation of the paper "RecVAE: A New Variational Autoencoder for Top-N Recommendations with Implicit Feedback"
Apache License 2.0
110 stars 31 forks source link

confused about dataset split #4

Open junkangwu opened 3 years ago

junkangwu commented 3 years ago

Hi, nice work about Variational Autoencoder on recommendation. However, I am confused about the method of data split which is the same way as 2018WWW-Variational autoencoders for collaborative filtering In the https://github.com/ilya-shenbin/RecVAE/blob/8b9b2ded3f215f9e30b45a9cc61199b67fc3da42/preprocessing.py#L60 unique_uid is the index of active user rather than the uid (unique_uid['userId']). Owing to the filter operator before, some userId are moved out. Then some valid userId at the end will not be considered if we adopt the index of user_activity rather than the actual uid. I guess it might be a error or is there any other meaning of that?

Looking forward to your reply, Thanks. Best.

shashankg7 commented 3 years ago

I have the same doubt. I am not sure why index is used instead of the actual uid ?

YvetteLi commented 2 years ago

Hi,

I agree with you and I think its a bug in the code. Initially, I wasn't able to run the code and thought it was probably some data issue, and I went back to change the code as follows.

In preprocess.py

def filter_triplets(tp, min_uc=min_uc, min_sc=min_sc): 
    if min_sc > 0:
        itemcount = get_count(tp, 'movieId')
        tp = tp[tp['movieId'].isin(itemcount[itemcount >= min_sc].movieId)]
        # tp = tp[tp['movieId'].isin(itemcount.index[itemcount >= min_sc])]
    if min_uc > 0:
        usercount = get_count(tp, 'userId')
        tp = tp[tp['userId'].isin(usercount[usercount >= min_uc].userId)]
        # tp = tp[tp['userId'].isin(usercount.index[usercount >= min_uc])]

    usercount, itemcount = get_count(tp, 'userId').set_index('userId'), get_count(tp, 'movieId').set_index('movieId')
LiaoYunxi commented 2 years ago

thanks for your advice~