yaochenzhu / LLM4Rec

(WWW'24) The first RS that tightly combines LLM with ID-based RS.
MIT License
99 stars 5 forks source link

ContentGPT #2

Closed mtoltoxl closed 9 months ago

mtoltoxl commented 9 months ago

Hi, thanks for sharing the code! I've been inspecting through your code and came up with few questions.

Do you expect your ContentGPT to reconstruct the input (i.e. biography, item content, or review) when the item_id token is given? Is it normal that the loss for ContentGPT does not fall in the very first 10 epochs of training? How are you sure that the natural language semantics and collaborative semantics all mingled well in item_id tokens?

Hope to hear from you soon! Thanks.

yaochenzhu commented 9 months ago

Hi Minseok,

Thanks for reaching out! I hope all is well with you. Below are an itemized response to your questions :)

  1. The purpose of the ContentGPT.

Basically, for the pretraining of the ContentGPT, you can refer to the example "(b) User/Item Textual Features" in the grey box of Section 3.2.2, Soft+Hard Prompting, for an intuitive understanding.

In a nutshell, for textual features associated with a user, an item, or a user/item pair, we design a prompt (composed of heterogeneous tokens of user/item/vocab) that describes the nature of the features (e.g., whether it is a user profile, an item description, or a review), and use it to predict the features themselves (composed of homogenous tokens), such that the features can be encoded into the user/item content token embeddings.

  1. The loss of the ContentGPT.

Could you provide more details about the dataset used for experiments and the value of lambda_V? Thanks.

  1. The mingle of collaborative and content token embeddings.

Actually, we use a mutual regularization strategy to constrain the user/item collaborative embeddings with user/item content embeddings, where collaborative embeddings can help the ContentGPT to find information relevant for recommendations in the user/item content features, whereas content embeddings can introduce side information to support CollaborativeGPT and RecGPT. This has its theoretical root in MaP estimation given the generative process defined in Sections 3.2 and 3.3. I guess this mutual-regularization process is the "mingling process" mentioned in your question.

In our optimization, there's actually a hyperparameter, lambda_V, that controls the strength of mutual regularization, where its value is selected by grid search to ensure a good mingling of the collaborative and content information. The sensitivity analysis can be found in Figure 4.

Best, Yaochen

mtoltoxl commented 9 months ago

Hi, thanks for your prompt reply.

For the ContentGPT, due to the limited resource, I've partitioned one of your dataset, Amazon Beauty (~10%), and set the batch size as 4. For the initial training of ContentGPT, I reckon 'regularize=False'; the loss for reconstructing the hard prompt (e.g. review) does not decrease.

As you've mentioned, "such that the features can be encoded into the user/item content token embeddings.", I believe that after initially training the ContentGPT (w/o regularizing with CollaborativeGPT) should be able to reconstruct the hard prompt (e.g. review) it learned when provided with soft+hard prompt. However, I couldn't observe it (for my case, since the loss didn't fall at all, this phenomenon is evident). Were you able to reconstruct something semantically similar?

Thanks!

yaochenzhu commented 9 months ago

Hi, thanks for your prompt reply.

For the ContentGPT, due to the limited resource, I've partitioned one of your dataset, Amazon Beauty (~10%), and set the batch size as 4. For the initial training of ContentGPT, I reckon 'regularize=False'; the loss for reconstructing the hard prompt (e.g. review) does not decrease.

As you've mentioned, "such that the features can be encoded into the user/item content token embeddings.", I believe that after initially training the ContentGPT (w/o regularizing with CollaborativeGPT) should be able to reconstruct the hard prompt (e.g. review) it learned when provided with soft+hard prompt. However, I couldn't observe it (for my case, since the loss didn't fall at all, this phenomenon is evident). Were you able to reconstruct something semantically similar?

Thanks!

Hi Minseok,

Thanks for providing the detailed information. I didn't save the training details when I ran the experiments, as the codes automatically select the best model based on the validation set. I'm now rerunning the full-scale experiments to retrieve the training log, which may take some time :)

Here, for the ContentGPT, we only want to encode content information into the user/item token embeddings via language modeling. To generate a full review, i.e., "to reconstruct something semantically similar" mentioned in your question, based on only user/item content embeddings is a more difficult task. Nevertheless, for the loss issue, I'll get back to you as long as I have the results :)

For now, the effectiveness of the ContentGPT can be verified by Comparison with LLM-CF in Table 1&2, where we have removed the ContentGPT and mutual-regularization and use only CollaborativeGPT and RecGPT for recommendations.

Thank you for your patience.

Best, Yaochen

yaochenzhu commented 9 months ago

Hi Minseok,

I've rerun the experiments on the Company server and I found that the training loss drop monotonically for the ContentGPT in the first ten epochs. Below is the screen shot of the training log.

Screenshot 2023-12-02 at 12 09 56 AM

I wonder whether your situation is because of the smaller batch size.

Best, Yaochen

mtoltoxl commented 9 months ago

Hi, Yaochen.

Thanks for sharing the training log. The loss drop tendency is almost the same when I ran your code with my dataset. Guess the batch size or other settings may have led to the difference with your benchmark dataset. I do really appreciate your help and will get back to you if I encounter new questions! Wish you a good luck on your future research as well.

Best, Minseok

yaochenzhu commented 9 months ago

Hi, Yaochen.

Thanks for sharing the training log. The loss drop tendency is almost the same when I ran your code with my dataset. Guess the batch size or other settings may have led to the difference with your benchmark dataset. I do really appreciate your help and will get back to you if I encounter new questions! Wish you a good luck on your future research as well.

Best, Minseok

Hi Minseok,

I'm glad that the codes work on your own dataset. Thanks for the kind words, and good luck with your research too!!

I'll mark this issue as closed. If you encounter new questions, feel free to reopen it or start a new issue :)

Best, Yaochen