Question about the training of LCL

njucckevin commented 1 year ago

Hi, thanks for the good work. I was confused after reading the paper about the LCL training process, i.e. sections 3.1 and 3.2. What are the inputs and outputs given to MLLM during LCL training? What does y_l in Eq. (1) refer to? Very much looking forward to your answer as I would like to report on this paper at the group meeting tomorrow. QAQ.

WeichenFan commented 1 year ago

Hi njucckevin,

1. During training, take 2-shots training for example the input would be "image 1, QA-pair 1, image 2, QA-pair 2, [Final Question]", and the target would be "[Answers to the Final question]".
1. The Eq. (1) refers to a standard autoregressive language modeling objective. For more information, you can refer to this blog.

By the way, good luck with your meeting. 💯

njucckevin commented 1 year ago

Thanks for the quick reply! Sorry for asking such a silly question 2, I mistakenly thought that y was related to the label to be predicted. 😂 For question 1, I'm somewhat familiar with (M)LLM and ICL, I was wondering how the context sample and the final sample are selected so that the model learns the capabilities of LCL. For example, in MetaICL[1], we just need the random select the sample for the same task as the context and final prediction. So in LCL training, what relations of context and final prediction samples enable MLLM to learn LCL capabilities. In paper, the S, C, T, N confused me a lot. [1] MetaICL: Learning to Learn In Context

WeichenFan commented 1 year ago

Thanks for the quick reply! Sorry for asking such a silly question 2, I mistakenly thought that y was related to the label to be predicted. 😂 For question 1, I'm somewhat familiar with (M)LLM and ICL, I was wondering how the context sample and the final sample are selected so that the model learns the capabilities of LCL. For example, in MetaICL[1], we just need the random select the sample for the same task as the context and final prediction. So in LCL training, what relations of context and final prediction samples enable MLLM to learn LCL capabilities. In paper, the S, C, T, N confused me a lot. [1] MetaICL: Learning to Learn In Context

Intuitively, we choose samples that are hard to distinguish as the support set for training, in order to force the model to 'take a look' at the support set during the prediction. Let's take 2-shots for example, given two very similar classes: c1 and c2, we choose "img_c1_1" and "img_c2_1" as the support set, and choose "img_c1_2" and "img_c2_2" as the query set. Actually, the samples of the support set and query set in MetaICL are from different tasks.

njucckevin commented 1 year ago

That's an intuitive choice. Thanks for the explanation~

WeichenFan commented 1 year ago

:)

isekai-portal / Link-Context-Learning

Question about the training of LCL #5