openai / gpt-2-output-dataset

Dataset of GPT-2 outputs for research in detection, biases, and more
MIT License
1.94k stars 549 forks source link

How is the output text generated? #22

Open venkatasg opened 4 years ago

venkatasg commented 4 years ago

Please do correct me if my understanding of this is wrong. GPT-2 was trained on WebText, and the small, medium and large models you provide are completions based on seeing a part of the context in each WebText post correct? When I load the datasets and look at the sentences, I'm not sure I understand how the posts generated by the large model correspond to the WebText training set. How does the model generate text (as in what is the context it uses to generate text)? Right now the xth post in WebText doesn't seem to correspond to the xth post in WebText.