openai / gpt-2-output-dataset

Dataset of GPT-2 outputs for research in detection, biases, and more
MIT License
1.93k stars 548 forks source link

Filenames for the finetuned Amazon review samples? #9

Closed drfraser closed 3 years ago

drfraser commented 4 years ago

Perhaps I overlooked something, but Google Cloud Storage does not support indexes and so the files in gs://gpt-2/output-dataset/v1-amazonfinetune/ are not easily downloaded, given they are not specified.

re: "Additionally, we encourage research on detection of finetuned models. We have released data under gs://gpt-2/output-dataset/v1-amazonfinetune/ with samples from a GPT-2 full model finetuned to output Amazon reviews."

For anyone wanting the files, the full list is / seems to be:

gs://gpt-2/output-dataset/v1-amazonfinetune/amazon-xl-1542M-k40.test.jsonl gs://gpt-2/output-dataset/v1-amazonfinetune/amazon-xl-1542M-k40.train.jsonl gs://gpt-2/output-dataset/v1-amazonfinetune/amazon-xl-1542M-k40.valid.jsonl gs://gpt-2/output-dataset/v1-amazonfinetune/amazon-xl-1542M-nucleus.test.jsonl gs://gpt-2/output-dataset/v1-amazonfinetune/amazon-xl-1542M-nucleus.train.jsonl gs://gpt-2/output-dataset/v1-amazonfinetune/amazon-xl-1542M-nucleus.valid.jsonl gs://gpt-2/output-dataset/v1-amazonfinetune/amazon-xl-1542M.test.jsonl gs://gpt-2/output-dataset/v1-amazonfinetune/amazon-xl-1542M.train.jsonl gs://gpt-2/output-dataset/v1-amazonfinetune/amazon-xl-1542M.valid.jsonl gs://gpt-2/output-dataset/v1-amazonfinetune/amazon.test.jsonl gs://gpt-2/output-dataset/v1-amazonfinetune/amazon.train.jsonl gs://gpt-2/output-dataset/v1-amazonfinetune/amazon.valid.jsonl

kaixi-wang commented 4 years ago

If you want to browse the lists of files for the entire project in a web browser, you just type in https://console.cloud.google.com/storage/browser/+[bucketpath]. So for the amazon data, you would go to: https://console.cloud.google.com/storage/browser/gpt-2/output-dataset/v1-amazonfinetune

Also, I noticed that the entire bucket is public so you can view the complete project

mrkaane commented 2 years ago

I'm not able to locate the files now. Does anybody have updated locations?