hexiangnan / theano-BPR

Using theano to implement the Matrix Factorization with BPR ranking loss
36 stars 13 forks source link

How's the execution time? #2

Open nijianmo opened 7 years ago

nijianmo commented 7 years ago

Hi, thank you for posting the demo code. I have one question regarding the execution time of the theano implementation of matrix factorization and it would be greatly appreciated if you could give me some suggestion.

Given a large dataset and if we use min-batch training, the transfer of min-batch from cpu to gpu seems very time consuming. In many cases it is much much slower than common version implemented by c++.

Have you met that problem before? Do you have some suggestion about how to decrease the execution time? Thanks for your attention :)

hexiangnan commented 7 years ago

Hi,

Thank you for your interest in my work. I do agree with your comments --- I actually have met the same issue. When I run the codes in GPU, it doesn't show significant speed-up compared to CPU.

The interaction between memory and GPU can be a reason. But the main reason IMO should be that the MF-BPR doesn't have very intensive matrix operations; and as a result, using GPU (which optimizes vectorized operations most) may not help that much.

My colleague Jingyuan Chen (cc-ed) has further worked on the codes to revise it for image/video recommendation, where the features of image and video have to be processed. According to her experiments, after adapting the BPR codes to image/video features, using GPU shows significant speed-ups. You may chat with her for more details if you are interested.

Best, Xiangnan

On Sun, Mar 26, 2017 at 8:01 AM, nijianmo notifications@github.com wrote:

Hi, thank you for posting the demo code. I have one question regarding the execution time of the theano implementation of matrix factorization and it would be greatly appreciated if you could give me some suggestion.

Given a large dataset and if we use min-batch training, the transfer of min-batch from cpu to gpu seems very time consuming. In many cases it is much much slower than common version implemented by c++.

Have you met that problem before? Do you have some suggestion about how to decrease the execution time? Thanks for your attention :)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/hexiangnan/theano-BPR/issues/2, or mute the thread https://github.com/notifications/unsubscribe-auth/ABGxjvB6VUKtKaa1Tp3lOq2TF0l2Y6k9ks5rparkgaJpZM4MpQvg .

-- Best Regards, Xiangnan He

nijianmo commented 7 years ago

Hi, thanks for the reply and suggestion :) I did some further analysis and found that indexing operation in theano (eg. indexing of shared variable) took over 90% of the total execution time. Seems like it's a critical issue with theano.

hexiangnan commented 7 years ago

Oh I see. I thought I found the same reason. Do you have any good solution to resolve it?

On Tue, Apr 4, 2017 at 8:53 AM, nijianmo notifications@github.com wrote:

Hi, thanks for the reply and suggestion :) I did some further analysis and found that indexing operation in theano (eg. indexing of shared variable) took over 90% of the total execution time. Seems like it's a critical issue with theano.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hexiangnan/theano-BPR/issues/2#issuecomment-291351632, or mute the thread https://github.com/notifications/unsubscribe-auth/ABGxjlwbqzK0dCfabguKf-YyHk7JlYd_ks5rsZR0gaJpZM4MpQvg .

-- Best Regards, Xiangnan He

nijianmo commented 7 years ago

I haven't resolved that yet. I'd like to reimplement the code in tensorflow or mxnet and figure out if this is a common issue for these backends.

hexiangnan commented 7 years ago

Ok. Please let me know if you have any progress. Thanks!

On Tue, Apr 4, 2017 at 1:58 PM, nijianmo notifications@github.com wrote:

I haven't resolved that yet. I'd like to reimplement the code in tensorflow or mxnet and figure out if this is a common issue for these backends.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hexiangnan/theano-BPR/issues/2#issuecomment-291401142, or mute the thread https://github.com/notifications/unsubscribe-auth/ABGxjuAQ5QoaEAlEvN6qVCZz5PiW3zyHks5rsdv_gaJpZM4MpQvg .

-- Best Regards, Xiangnan He

eggie5 commented 7 years ago

@nijianmo I'm working on a TF BPR implementation too and model trains in the same time on my CPU and GPU machine. Maybe BPR just isn't well suited for GPU computations. Any insights?

nijianmo commented 7 years ago

Hi @eggie5, as I mentioned in this post, I think the indexing operation is the main cause of long execution time. Alternatively, I tried the embedding layer in keras and found it was really fast. Haven't studied the source code yet but I think their implementation can give us the answer.

hexiangnan commented 7 years ago

Hi Blacksoil,

What do you mean by "the indexing operation"? Can you provide more details?

On Sun, Apr 30, 2017 at 11:52 AM, Blacksoil notifications@github.com wrote:

Hi @eggie5 https://github.com/eggie5, as I mentioned in this post, I think the indexing operation is the main cause of long execution time. Alternatively, I tried the embedding layer in keras and found it was really fast. Haven't studied the source code yet but I think their implementation can give us the answer.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hexiangnan/theano-BPR/issues/2#issuecomment-298209436, or mute the thread https://github.com/notifications/unsubscribe-auth/ABGxjq89Se4MkTIKV2-YHS-rvxFu8iO5ks5r1AV5gaJpZM4MpQvg .

-- Best Regards, Xiangnan He

nijianmo commented 7 years ago

Hi, by indexing I mean using u = T.lvector('u') to get self.U[u]. This indexing corresponds to the 'subIncrease' operation in theano which is time consuming based on my observation.

BTW, congrats on your recent paper accepted to SIGIR :-) I'am checking your paper about neural factorization machines for sparse predictive analytics. Actually I have done some experiments before using MLP but have not got satisfying result. It would be great if you could give me some suggestion.

What I've done is using embedding layer and dense layer in Keras (given 'linear' or non-linear kernel) to formulate vanilla latent factor model, only user/item/rating and without side information in factorization machine. But the model is easily getting over-fitting and not generalize well on validation set. I've tried bunch of parameters though none of them works well.

Have you tried vanilla latent factor model? I am not sure if my bad performance is because of my implementation or it is because of the FM have more parameters. Thanks!

hexiangnan commented 7 years ago

The MLP model you mentioned should be the Wide&Deep right? The model is indeed easy to be overfitting and a good initialization of the embedding layer is very important. I have mentioned that in my SIGIR 2017 NFM paper.

For the latent factor model that without side information, you can check my WWW 2017 paper "Neural Collaborative Filtering". In the paper, I find the best practice is to ensemble the MF and MLP model in the latent space. In this case, the MLP is essentially learning the residual of MF which gives better generalization.

On Sun, Apr 30, 2017 at 3:40 PM, Blacksoil notifications@github.com wrote:

Hi, by indexing I mean using u = T.lvector('u') to get self.U[u]. This indexing corresponds to the 'subIncrease' operation in theano which is time consuming based on my observation.

BTW, congrats on your recent paper accepted to SIGIR :-) I'am checking your paper about neural factorization machines for sparse predictive analytics. Actually I have done some experiments before using MLP but have not got satisfying result. It would be great if you could give me some suggestion.

What I've done is using embedding layer and dense layer (given 'linear' or non-linear kernel) to formulate vanilla latent factor model, without side information in factorization machine. But the model is easily getting over-fitting and not generalize well on validation set. I've tried bunch of parameters though none of them works well.

Have you tried vanilla latent factor model? I am not sure if it is because of my implementation or it is because of the FM have more parameters. Thanks!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/hexiangnan/theano-BPR/issues/2#issuecomment-298217153, or mute the thread https://github.com/notifications/unsubscribe-auth/ABGxjrMSoPBlCWe12I0xivP_DkXuD2-3ks5r1Dr8gaJpZM4MpQvg .

-- Best Regards, Xiangnan He