Please do correct me if my understanding of this is wrong. GPT-2 was trained on WebText, and the small, medium and large models you provide are completions based on seeing a part of the context in each WebText post correct? When I load the datasets and look at the sentences, I'm not sure I understand how the posts generated by the large model correspond to the WebText training set. How does the model generate text (as in what is the context it uses to generate text)? Right now the xth post in WebText doesn't seem to correspond to the xth post in WebText.
Please do correct me if my understanding of this is wrong. GPT-2 was trained on WebText, and the small, medium and large models you provide are completions based on seeing a part of the context in each WebText post correct? When I load the datasets and look at the sentences, I'm not sure I understand how the posts generated by the large model correspond to the WebText training set. How does the model generate text (as in what is the context it uses to generate text)? Right now the xth post in WebText doesn't seem to correspond to the xth post in WebText.