Open guotong1988 opened 1 year ago
Hi @guotong1988, you can find the whole list of data using the following code which you will get in _download_dataset.py_:
for ds in [
'webtext',
'small-117M', 'small-117M-k40',
'medium-345M', 'medium-345M-k40',
'large-762M', 'large-762M-k40',
'xl-1542M', 'xl-1542M-k40',
]:
for split in ['train', 'valid', 'test']:
filename = ds + "." + split + '.jsonl'
r = requests.get("https://openaipublic.azureedge.net/gpt-2/output-dataset/v1/" + filename, stream=True)
From above you can download any specific file as follows: https://openaipublic.azureedge.net/gpt-2/output-dataset/v1/small-117M.train.jsonl
Else, you can run _downloaddataset.py to download all the dataset files.
I hope this helps.
thank you
Thanks
Thanks
Hi, How are you? I have some question. How to contact with you? Thanks.
Thank you very much!