Closed Qinghao-Hu closed 5 months ago
@Tonyhao96 Hey. There was a bug in our DoReMi's implementation, we have fixed it [link] (not merged yet) and reran the experiment on Fineweb2 (a up comming release dataset) and a few other domains from the pile, and the stack... So the previous experiment results are not legit. Stay tune for the new release!!
For the new experiment result, check the last images of this tweet: https://twitter.com/xariusrke/status/1774089131351584852
https://huggingface.co/datasets/nanotron/the-pile-for-doremi/tree/main
I cannot download the dataset, could you please check it?