openai / gpt-2-output-dataset

Dataset of GPT-2 outputs for research in detection, biases, and more
MIT License
1.93k stars 548 forks source link

What prompt is used to generate the GPT2 datasets? #50

Open veenapaddy opened 1 year ago

veenapaddy commented 1 year ago

I see that GPT2 is trained on webtext, but not sure how the datasets here are generated? Specifically what prompt was used with GPT2 to generate the "fake" datasets?

MilerCt commented 1 year ago

datasets are generated by asking the gpt-3 or gpt-4 or gpt-2 gpt-1 which count as the data sets but you have to ask the ai 600 times to prefect one question from asking the ai which than you could use next time to see if that question might be close to the other one which is kind of nice not spending 6 hours per question almost like the video when someone was trying to make a ai that plays n64 mario racing cart perfectly but it took a big amount of training even with the gpu