Open veenapaddy opened 1 year ago
datasets are generated by asking the gpt-3 or gpt-4 or gpt-2 gpt-1 which count as the data sets but you have to ask the ai 600 times to prefect one question from asking the ai which than you could use next time to see if that question might be close to the other one which is kind of nice not spending 6 hours per question almost like the video when someone was trying to make a ai that plays n64 mario racing cart perfectly but it took a big amount of training even with the gpu
I see that GPT2 is trained on webtext, but not sure how the datasets here are generated? Specifically what prompt was used with GPT2 to generate the "fake" datasets?