huggingface / nanotron

Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.14k stars 107 forks source link

`nanotron/the-pile-for-doremi` is empty #127

Closed Qinghao-Hu closed 5 months ago

Qinghao-Hu commented 5 months ago

https://huggingface.co/datasets/nanotron/the-pile-for-doremi/tree/main

I cannot download the dataset, could you please check it?

xrsrke commented 5 months ago

@Tonyhao96 Hey. There was a bug in our DoReMi's implementation, we have fixed it [link] (not merged yet) and reran the experiment on Fineweb2 (a up comming release dataset) and a few other domains from the pile, and the stack... So the previous experiment results are not legit. Stay tune for the new release!!

For the new experiment result, check the last images of this tweet: https://twitter.com/xariusrke/status/1774089131351584852