Closed drfraser closed 3 years ago
If you want to browse the lists of files for the entire project in a web browser, you just type in https://console.cloud.google.com/storage/browser/+[bucketpath]. So for the amazon data, you would go to: https://console.cloud.google.com/storage/browser/gpt-2/output-dataset/v1-amazonfinetune
Also, I noticed that the entire bucket is public so you can view the complete project
I'm not able to locate the files now. Does anybody have updated locations?
Perhaps I overlooked something, but Google Cloud Storage does not support indexes and so the files in gs://gpt-2/output-dataset/v1-amazonfinetune/ are not easily downloaded, given they are not specified.
re: "Additionally, we encourage research on detection of finetuned models. We have released data under gs://gpt-2/output-dataset/v1-amazonfinetune/ with samples from a GPT-2 full model finetuned to output Amazon reviews."
For anyone wanting the files, the full list is / seems to be:
gs://gpt-2/output-dataset/v1-amazonfinetune/amazon-xl-1542M-k40.test.jsonl gs://gpt-2/output-dataset/v1-amazonfinetune/amazon-xl-1542M-k40.train.jsonl gs://gpt-2/output-dataset/v1-amazonfinetune/amazon-xl-1542M-k40.valid.jsonl gs://gpt-2/output-dataset/v1-amazonfinetune/amazon-xl-1542M-nucleus.test.jsonl gs://gpt-2/output-dataset/v1-amazonfinetune/amazon-xl-1542M-nucleus.train.jsonl gs://gpt-2/output-dataset/v1-amazonfinetune/amazon-xl-1542M-nucleus.valid.jsonl gs://gpt-2/output-dataset/v1-amazonfinetune/amazon-xl-1542M.test.jsonl gs://gpt-2/output-dataset/v1-amazonfinetune/amazon-xl-1542M.train.jsonl gs://gpt-2/output-dataset/v1-amazonfinetune/amazon-xl-1542M.valid.jsonl gs://gpt-2/output-dataset/v1-amazonfinetune/amazon.test.jsonl gs://gpt-2/output-dataset/v1-amazonfinetune/amazon.train.jsonl gs://gpt-2/output-dataset/v1-amazonfinetune/amazon.valid.jsonl