huggingface / nanotron

Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.09k stars 105 forks source link

Request for detailed FineWeb-ablation-models training strategy & hyperparams #201

Closed JefferyChen453 closed 1 month ago

JefferyChen453 commented 2 months ago

I'm trying to re-produce the evaluaions of your FineWeb-ablation-models, but my results are not comparable to yours under the same setting for model. May I ask for the training config file for your ablation-models?

koalazf99 commented 1 month ago

Hi~ @JefferyChen453 I also want to know the training config of the fineweb ablation models What is your hyper-parameter? May I ask what computing resource you are using currently?

guipenedo commented 1 month ago

Hi, for the evaluations, see the comments here: https://huggingface.co/datasets/HuggingFaceFW/fineweb/blob/main/lighteval_tasks.py for training: https://huggingface.co/datasets/HuggingFaceFW/fineweb/discussions/39

koalazf99 commented 1 month ago

Thanks!