Consider masking out the padding embeddiing at the tail of each session and adding positional embedding.

johnny12150 / NISER

Using SR-GNN to implement "NISER: Normalized Item and Session Representations to Handle Popularity Bias"

https://arxiv.org/pdf/1909.04276.pdf

25 stars 8 forks source link

Consider masking out the padding embeddiing at the tail of each session and adding positional embedding. #2

Closed SpaceLearner closed 3 years ago

SpaceLearner commented 3 years ago

In the original paper the normalization is made with items in one session. While when implementing the algorithm we need to pad some positions to make the length of sessions the same in one batch. So when calculating the l2 norm the items at irrelative positions should be ignored. The author also found adding positional embedding is a little helpful. By the way, I'm looking forward for your number on Star GNN. Now I have reproduced the number for yoochoose1/64 while I can't get the same number on diginetica. Thank you a lot!

johnny12150 commented 3 years ago

Thanks for the advice. I will update the code to see if the performance improves.

By the way, could you share the Star GNN code with me? I am still checking my codes since I can reach the performance claimed on the paper with both datasets.

SpaceLearner commented 3 years ago

Do you mean you have reached the performance claimed in the paper with both datasets? Since that, could you please release the code? Thanks! I will share with you my code once I have organized the code well. I just found layer norm useless and I am trying to increase the number with diginetica.

johnny12150 commented 3 years ago

Oops! I mean I haven't reached the performance yet. It should be can't. Sorry for the inconvenience.

SpaceLearner commented 3 years ago

I found fixing the order of training sample and use hidden size 256(introduced in the original paper) with l2 normalization increase the number greatly. Maybe you can try them in your star gnn code.

johnny12150 commented 3 years ago

Does it seem that the performance gain is from the l2 norm instead of the star graph topology?

SpaceLearner commented 3 years ago

Mainly from larger hidden size and l2 norm(which is introduced in NISER), star only bring a little gain. If you set the hidden size to be 100 like SR-GNN and NISER, the performance will drop a lot.

johnny12150 commented 3 years ago

Hidden size and l2 norm really boost the performance for many models!