Open Zeyu-ZEYU opened 1 year ago
Hi, are there any alternative links or other ways to download data now?
The entire pile_neox
is gone now.
I also encountered this problem. Can someone provide the dataset files?
I also encounter this issue, is there any way to fix it.
I also encounter this issue, is there any way to fix it.?
Sorry I also encounter this issue, could any developer kindly tell us how to fix this?
The original bookcorpus dataset is no longer available, but there are equivalents and steps to reproduce the data:
https://towardsdatascience.com/replicating-the-toronto-bookcorpus-dataset-a-write-up-44ea7b87d091 https://huggingface.co/datasets/bookcorpus
The links above are not accessible now, which are also in examples_deepspeed/sequence_parallel/ds_pretrain_gpt_1.3B_seq_parallel_32k.sh