[UPRISE]After rereading the paper UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation,I have some questions.

microsoft / LMOps

General technology for enabling AI capabilities w/ LLMs and MLLMs

https://aka.ms/GeneralAI

MIT License

3.71k stars 283 forks source link

[UPRISE]After rereading the paper UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation,I have some questions. #262

Open zhouchang123 opened 2 months ago

zhouchang123 commented 2 months ago

1.How to get the scores through GPT-Neo-2.7B? 2.In which procedure,the prompt get positive or negative,after get the scores or after encode before score?

cdxeve commented 2 months ago

Q1: How to get the scores through GPT-Neo-2.7B? By calculating the task metric score of each input concatenation of prompt + testing input, see Section 3.2.

Q2: In which procedure, the prompt get positive or negative, after get the scores or after encode before score? After getting the scores. For all the scored prompts for a training example, we label the prompt with the highest score as positive. For negative samples, we randomly sample B training demonstrations from the prompt pool, in addition, we label B demonstrations corresponding to the lowest B scores in the sampled prompts as hard negatives, details are in Section 3.2.

zhouchang123 commented 2 months ago

What about the score through prompt retriever? Is the similarity of the two vectors after encoder? Thanks very much.

cdxeve commented 2 months ago

You may refer to Section 3.4 to see how we get the score after tuning the prompt retriever.

zhouchang123 commented 2 months ago

Section 3.4 introduced the inference part? It is the same in training pipline ?

cdxeve commented 2 months ago

Training is in Section 3.3, you may refer to the provided code as well.

zhouchang123 commented 2 months ago

Section 3.3 only introduce sim(x, p) ,do you mean sim(x, p) is the score ?

cdxeve commented 2 months ago

Yes, sim(x, p) is the score.

zhouchang123 commented 2 months ago

In paper,the positive prompt number is 1 and negative prompt number is 20.But not demonstrate the total number of prompts in one train epoch . What will happen if the prompts not positive or negative? To the prompts not positive or negative,InfoNCE seems not include these prompts.

cdxeve commented 2 months ago

Yes, InfoNCE would not consider the prompts that are neither positive nor negative.

zhouchang123 commented 1 month ago

I found some confusion about the pipline of training and inferencing. In training pipline, the input is include the task name and the query and the metric considerates the task. However when inferencing,the input is only the query without task name. So could add a module that according to the query to clarify its task name,and first filter the task name then retriever? @cdxeve

cdxeve commented 1 month ago

We do not input the task name during training, and the task name in the image is only for ease of understanding. You may refer to the formula in section 3.2 for details.

zhouchang123 commented 4 weeks ago

I viewed the file prompt_pool.json and each dict is annotated to different task name.So the task name is only to divide to its metric score? The normal state of mind when retrieving is to retriever in the prompts of similar task rather than all the prompts.

cdxeve commented 4 weeks ago

Q1: Is the task name only used to divide it by metric score?
A1: We keep the task name in the metadata to support many potential uses, but we don’t include it as input during training.

Q2: The normal state of mind when retrieving is to retriever in the prompts of similar task rather than all the prompts. A1: You could try this for a quick test, but I think the diversity will be too constrained since the number of tasks is much smaller than the number of demonstrations in the prompt pool.