Closed trimkaleci closed 6 years ago
Does anyone have any idea with regard to the question above?
Thanks a lot in advance!
Hi, @leo1023 . In my opinion, BPR is only suitable for ranking task and TimeSVD is designed for rating prediction task. so if you want to combine the two models, you can take the idea of how TVBPR modeling the visual information and use it with TimeSVD.
I haven't read TVBPR's paper yet. Maybe I will give you a more concrete answer after reading it.
Hi @SunYatong . Thanks a lot for writing!
Yes, that was confusing for me, since we deal with algorithms which are designed for different tasks. I have already done an implementation of the combination of both models, and I use similar idea for modeling the visual information. While using the error (error = predicted_rating - real_rating) for learning the model, I was getting high values (which were going to infinity), because of the visual information being involved, thus it doesn't make sense using the error in this context. After that, I did not use the explicit feedback(i.e. ratings) for learning the model, but only the information about which items were rated, and thus the model parameters were learned similarly as in TVBPR (where positive items and negative items are considered), but instead I considered only the positive items. For example, in TVBPR, the parameters are learned with respect to deri = 1 / (1 + exp(x_u,i - x_u,j)) where x_u,i is the predicted value for the positive item, whereas the x_u,j is the predicted value for the negative item. Whereas, with regard to timeSVD++, as only positive items are considered, the parameters are learned with respect to deri = 1 / (1 + exp(-x_u,i)).
Then, for evaluating the performance of the three models, I am using the AUC as defined in the paper where TVBPR is presented. The data for testing and validation set are split by using 2 -leave-out cross validation, such that for each user there is one item in the test set and one in the validation set. Then, the AUC for single user is calculated (if we evaluate the model on the test set) by calculating the preference value of the testing item, and then calculating the preference values for all negative items with respect to the user being considered. A correct prediction is considered if the preference value of the testing item is greater than the preference value of the negative item (this is based on the theory of BPR where is stated that positive items should be more preferred compared to negative items). Counting the number of total correct predictions against the total number of negative items, we would have the probability that the model would predict correctly for that user. At the end, having AUC calculated for all users, we calculate the total AUC showing the performance of the model, by summing up the AUC of each user and dividing it by the number of users.
For the moment, I get the following results: - timeSVD++: AUC = 0.5508 - TVBPR: AUC = 0.7010 - timeSVD++ plus the visual component from TVBPR: AUC = 0.43811
But, I am not sure why do I get these results. For example, what might be the reason I get smaller value of AUC on timeSVD++? Or, why do I get greater value of AUC when using the TVBPR model?
Do you have any idea what might be the reasons of having these results? Or, what results would you expect and why?
I would appreciate any idea or advice from your side!
Thank you very much in advance!!
Hi, the pairwise loss is used to maximize the difference between positive item and negative item, but you said "with regard to timeSVD++, as only positive items are considered, the parameters are learned with respect to deri = 1 / (1 + exp(-x_u,i))". And I don't understand how do you build your loss function.
Besides, you'd better print the loss after each iteration to see if it is reduced.
On Sat, Dec 16, 2017 at 6:14 PM, Leo10 notifications@github.com wrote:
Hi @SunYatong https://github.com/sunyatong . Thanks a lot for writing!
Yes, that was confusing for me, since we deal with algorithms which are designed for different tasks. I have already done an implementation of the combination of both models, and I use similar idea for modeling the visual information. While using the error (error = predicted_rating - real_rating) for learning the model, I was getting high values (which were going to infinity), because of the visual information being involved, thus it doesn't make sense using the error in this context. After that, I did not use the explicit feedback(i.e. ratings) for learning the model, but only the information about which items were rated, and thus the model parameters were learned similarly as in TVBPR (where positive items and negative items are considered), but instead I considered only the positive items. For example, in TVBPR, the parameters are learned with respect to deri = 1 / (1
- exp(x_u,i - x_u,j)) where x_u,i is the predicted value for the positive item, whereas the x_u,j is the predicted value for the negative item. Whereas, with regard to timeSVD++, as only positive items are considered, the parameters are learned with respect to deri = 1 / (1 + exp(-x_u,i)).
Then, for evaluating the performance of the three models, I am using the AUC as defined in the paper where TVBPR is presented. The data for testing and validation set are split by using 2 -leave-out cross validation, such that for each user there is one item in the test set and one in the validation set. Then, the AUC for single user is calculated (if we evaluate the model on the test set) by calculating the preference value of the testing item, and then calculating the preference values for all negative items with respect to the user being considered. A correct prediction is considered if the preference value of the testing item is greater than the preference value of the negative item (this is based on the theory of BPR where is stated that positive items should be more preferred compared to negative items). Counting the number of total correct predictions against the total number of negative items, we would have the probability that the model would predict correctly for that user. At the end, having AUC calculated for all users, we calculate the total AUC showing the performance of the model, by summing up the AUC of each user and dividing it by the number of users.
For the moment, I get the following results: - timeSVD++: AUC = 0.5508 - TVBPR: AUC = 0.7010 - timeSVD++ plus the visual component from TVBPR: AUC = 0.43811
But, I am not sure why do I get these results. For example, what might be the reason I get smaller value of AUC on timeSVD++? Or, why do I get greater value of AUC when using the TVBPR model?
Do you have any idea what might be the reasons of having these results? Or, what results would you expect and why?
I would appreciate any idea or advice from your side!
Thank you very much in advance!!
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/guoguibing/librec/issues/227#issuecomment-352174124, or mute the thread https://github.com/notifications/unsubscribe-auth/AQWU7sjdbz_6esXSf5IxA6NS3iFlXYPxks5tA5gQgaJpZM4Q5IYS .
Hi @SunYatong! Actually, "with respect to timeSVD++", I meant timeSVD++ with the visual component included. I am a bit confused here about how to build the loss function. What would you suggest? Because for learning the visual dimensions, there is an embedding matrix E introduced. Which, as described on the paper, it is defined as follows:
"Let fi denote the Deep CNN features of item i and F represent its number of dimensions (F = 4096). We further introduce a K X F embedding matrix E to linearly embed the high-dimensional feature vector fi into a much lower-dimensional (i.e., K, can be set to 20) visual style space. Namely, we take: theta_item = E fi "
Then we learn the values of the embedding matrix, and thus we would have the visual space. As I want to increase the values of the embedding matrix E, I am using deri = 1 / (1 + exp(-x_u,i)) as using the error gives me high values (this way, by using "deri", I am trying to have smaller value).
I am not really sure about how I can build the loss function. Can you please give me an idea?
Thanks a lot!!!
Hi, As you are solving a ranking problem with implicit feedback, if you want to use pointwise loss, you should use sigmoid to bound your predicted rating into (0,1) first and use log loss: -ylogy' - (1-y)log(1-y'), which means minimizing the difference between your predicted distribution and the real distribution.
If you are using the pairwise loss, your goal is to maximize the difference between the positive predictions and negative predictions, your loss function should be minimizing: -log(sigmoid(y_positive-y_negative))
Hi @SunYatong , thanks a lot for your clarification! I just wanted to make sure:
Is that right?
Thank you!!
Hi, @leo1023
Yes, the real ratings should also be binarized. In LibRec we use the configuration "data.convert.binarize.threshold=?" to convert the explicit feedback into implicit feedback.
Your loss function is right.
Hi @SunYatong! I tried what you propose, but as the predicted value is always a big value (e.g. when sigmoid function is applied in high values because of the image features , let's say of 178, I get 1.0 as output), then when log is applied to 1.0, I get 0. I am not really sure how to continue with this.
I am sharing my code with you, please if you have, I would really appreciate it! Here is the link to the code of timeSVD++ extended with the visual component: link to the code.
Please also find here the link to the file of the image features.
Thank you very much in advance!!
Hi @SunYatong! I have one more question:
Thank you!!
Hi, @leo1023
For the loss question
For the performance question:
Hi @SunYatong !
I have done an implementation of timeSVD++ with the visual component included by using the pairwise loss like BPR. Now, when running it on fashion data, I get better performance from timeSVD++ extended with the visual component compared to TVBPR. In order to find the reason of why I get a better performance of the extended version of timeSVD++ compared to TVBPR, I experimented with them on different settings, as shown below:
Now, I am experimenting by setting the number of non-visual factors to different values (i.e. from 10 to 50), and see how it influences the performance of the models. But, I am still not sure why the performance of the extended version of timeSVD++ is better than TVBPR. Could it be that we add more parameters for the factors of the users and make the non-visual factors of users time-dependent and also make use of implicit feedback (as it done on timeSVD++, Ru set), whereas in TVBPR the non-visual factors stays static (see paper here ? If this is the case, can you please give me any idea or suggestion of how can I prove it?
P.S. Please here find my implementation of TVBPR, which on dataset of 4315 actions (i.e. ratings), it takes 2 to 3 days for training the parameters of the model, and I cannot figure out why it is taking so much time, or it is normal?
Thank you very much for any help in advance!
Hi, @leo1023 . First of all congratulate you that you have implemented your model! And here are some suggestions for your experiments.
@SunYatong Thank you very much for your support!
Regarding the performance of the algorithms, with "epoch" I mean splitting the dataset into certain number of partitions - one partition corresponds to certain period of time, and thus learning the time-dependent parameters. I set the number of epochs to different values, and see whether splitting the dataset into different number of periods affects the performance of the models. Does it make sense?
I am not training a neural network, but instead I use the image features already extracted through Deep CNN - where one image is represented by 4096 values (values ranging from 0 to 1). Then the author of the paper where TVBPR is presented, he introduces a K X F embedding matrix E to linearly embed the high-dimensional image feature vector into a much lower-dimensional (i.e., K, can be set to 20, and F being the number of image features, 4096) visual style space. Afterwards, through training, we learn the values of the embedding matrix E. Do you think that still I would need a bigger dataset, or I can still have comparable results with the dataset I am using (i.e. 4315 ratings)?
Thank you very much in advance!
Hi, @leo1023
Hello to everyone,
I want to extend the timeSVD++ model with an additional component, which is a visual component presented here, in TVBPR model respectivelly. But, while thinking about it, I am not sure about how to train the model, as the models mentioned above use different learning approaches. One(timeSVD++) is trained using explicit information (i.e. ratings), whereas the other one (TVBPR) is based on BPR (Bayesien Personalized Ranking) learning approach (i.e. making use of positive items and negative items).
Could anyone of you please suggest me with any idea if this would be possible, or any help?
Note: the extended version of timeSVD++ will be used in the context of fashion data, and thus will be tested using the already provided data by Amazon (link to the data, here).
Thank you very much in advance!