Did WizardCoder generate data from GPT-4 or GPT-3.5?

swtheing / WizardCoder_Instruct_Generator

Generate the WizardCoder Instruct from the CodeAlpaca

20 stars 2 forks source link

Did WizardCoder generate data from GPT-4 or GPT-3.5? #1

Open Symbolk opened 1 year ago

Symbolk commented 1 year ago

Thanks for this repo! I am also reading the paper recently, but I did not notice which LLM the WizardCoder used to generated their Evol-Instruct data. According to your implementation, gpt4_azure is used, is it the same with WizardCoder (considering that Microsoft insiders could use the API for free since early this year), or you just guess they used GPT-4?

swtheing commented 1 year ago

Yeah, the paper show the instructions are generated by GPT-4 or GPT-3.5. But, how to generate the response of the new generated instruction is not explicitly shown in their paper. I guess the response should be generated by GPT-4

Symbolk commented 1 year ago

Yeah, the paper show the instructions are generated by GPT-4 or GPT-3.5. But, how to generate the response of the new generated instruction is not explicitly shown in their paper. I guess the response should be generated by GPT-4

Do you mean only the ###Response is generated by GPT-4? A third choice is generating programming tasks with GPT-3.5, and then pass the task to GPT-4 to generate the solution.

FIY: As the just leaked GPT-4 detail shows (https://threadreaderapp.com/thread/1678545170508267522.html), GPT-4 is trained 2 epochs for text-based data and 4 for code-based data!

jaideep11061982 commented 1 year ago

@Symbolk my objective is to do just batch inferencing in that case what is going to be my prompt template with an eg. if you can explain thanks in advance

haorannlp commented 1 year ago

Hi, did you successfully reproduce the training data?