Hi, Thanks for making the great dataset public. I have download the webtext.train.jsonl, a file in 250k line. I am not sure about whether it is just a sample or a slice of the WebText training set on which the GPT-2 models are trained? May I have access to the full training set of the WebText?
Hi, Thanks for making the great dataset public. I have download the webtext.train.jsonl, a file in 250k line. I am not sure about whether it is just a sample or a slice of the WebText training set on which the GPT-2 models are trained? May I have access to the full training set of the WebText?
Looking forward to your reply or any advice.