openai / gpt-2-output-dataset

Dataset of GPT-2 outputs for research in detection, biases, and more
MIT License
1.93k stars 548 forks source link

What is the full link for gs://gpt-2/output-dataset/v1 #45

Open guotong1988 opened 1 year ago

guotong1988 commented 1 year ago

Thank you very much!

allosharma commented 1 year ago

Hi @guotong1988, you can find the whole list of data using the following code which you will get in _download_dataset.py_:

for ds in [
    'webtext',
    'small-117M',  'small-117M-k40',
    'medium-345M', 'medium-345M-k40',
    'large-762M',  'large-762M-k40',
    'xl-1542M',    'xl-1542M-k40',
]:
    for split in ['train', 'valid', 'test']:
        filename = ds + "." + split + '.jsonl'
        r = requests.get("https://openaipublic.azureedge.net/gpt-2/output-dataset/v1/" + filename, stream=True)

From above you can download any specific file as follows: https://openaipublic.azureedge.net/gpt-2/output-dataset/v1/small-117M.train.jsonl

Else, you can run _downloaddataset.py to download all the dataset files.

I hope this helps.

guotong1988 commented 1 year ago

thank you

HarmonyMurombo commented 1 year ago

Thanks

PeterYang03110 commented 7 months ago

Thanks

PeterYang03110 commented 7 months ago

Hi, How are you? I have some question. How to contact with you? Thanks.