Open bpucla opened 10 months ago
Correct
means verified correct answers. Besides, GPT4
and Human
were also used, indicating data with unknown correctness.
@imoneoi thank you for prompt response! I have two follow-up questions.
Are only the following prompts are used in training?
a). GPT4 Correct User
and GPT4 Correct Assistant
b). GPT4 User
and GPT4 Assistant
c). Code User
and Code Assistant
If the answer to previous question is Yes, does it mean you only used the GPT-4 data and did not use GPT-3.5 data?
Really appreciate your answers!
@bpucla
Human User
Human Assistant
@bpucla
- Yes, and
Human User
Human Assistant
是的,还有Human User
Human Assistant
- Yes. GPT-3.5 data is discarded in the 3.5 version是的。 GPT-3.5数据在3.5版本中被丢弃
Can more details be told? Because in that case V3.5 surpassed ChatGPT by using some secret datasets and it became hard for us to know what on earth made the biggest progress...
Are you using 0.1 of weight for the data with unknown correctness and 1.0 of weight for correct one? If not, could you please reveal more details?
@imoneoi You said: GPT-3.5 data is discarded in the 3.5 version
Does the existence of GPT-3.5 data provide the meaning for rewards and class-conditioned policy? The table 2 in your paper shows the value of GPT-3.5 which allows for rewards and class-conditioned policy?
Congrats to the authors on the great achievement!
Trying to understand your great work a bit more. In the inference examples, there are prompts like
GPT4 Correct User
,Code User
. What are other conditional prompts used in training? What doesCorrect
mean here? Thanks!