wuliwei9278 / SSE-PT

Codes and Datasets for paper RecSys'20 "SSE-PT: Sequential Recommendation Via Personalized Transformer" and NurIPS'19 "Stochastic Shared Embeddings: Data-driven Regularization of Embedding Layers"
109 stars 18 forks source link

Question on SSE #2

Open Chunpai opened 2 years ago

Chunpai commented 2 years ago

Thank you for your great work. I am not sure if I understand SSE-SE correctly. Based on your code, it seems you randomly replace the items in a sequence with random item or replace the user with other random user during SGD. Am I right? Also, can I view SSE-SE as a kind of data augmentation technique? Thanks.

wuliwei9278 commented 2 years ago

Hi, thanks for asking. In the codes, I did implement in the SGD stage because it is easier. But it really should work for any embeddings no matter where the embedding layer is. If the embedding layer is the bottom of the architecture (often the case) or the top (for the label part), then it is equivalent to data/label augmentation (as is done in the BERT pre-training or as is done in label smoothing). But if you read my other NeurIPS paper on the Stochastic Shared Embeddings, you can find that we are actually solving a different loss function which has a smoother loss landscape therefore easier to solve, and it doesn’t even have to be random but can be from a graph. It improves generalization error based on theoretical analysis.

On Sat, Jan 8, 2022 at 12:01 AM Chunpai @.***> wrote:

Thank you for your great work. I am not sure if I understand SSE-SE correctly. Based on your code, it seems you randomly replace the items in a sequence with random item or replace the user with other random user during SGD. Am I right? Also, can I view SSE-SE as a kind of data augmentation technique? Thanks.

— Reply to this email directly, view it on GitHub https://github.com/wuliwei9278/SSE-PT/issues/2, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD2L2HF7VERCSV6UBQMMKZDUU7VO5ANCNFSM5LQKZZOQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

wuliwei9278 commented 2 years ago

I also demonstrated how SSE can be used in computer vision by treating feature maps as embeddings in this CVPR paper: https://openaccess.thecvf.com/content_CVPR_2020/papers/Abavisani_Multimodal_Categorization_of_Crisis_Events_in_Social_Media_CVPR_2020_paper.pdf

"We treat feature maps of images as embeddings and use class labels to construct knowledge graphs. The feature maps of two images are connected by an edge in the graph, if and only if they belong to the same class (e.g. they are both labeled “affected individuals”). We follow the same procedure for text embeddings and construct a knowledge graph for text embeddings as well. Finally, we connect the nodes associated with the knowledge graph of image fea- ture maps with an edge to nodes in text’s knowledge graph if and only if they belong to the same class."

Chunpai commented 2 years ago

Thank you so much for your response. This is very helpful to me.