Questions regarding the dataset format and partition

openai / gpt-2-output-dataset

Dataset of GPT-2 outputs for research in detection, biases, and more

MIT License

1.93k stars 548 forks source link

Questions regarding the dataset format and partition #31

Open TingchenFu opened 1 year ago

TingchenFu commented 1 year ago

Hi, Thanks for making the great dataset public. I have download the webtext.train.jsonl, a file in 250k line. I am not sure about whether it is just a sample or a slice of the WebText training set on which the GPT-2 models are trained? May I have access to the full training set of the WebText?

Looking forward to your reply or any advice.