Firstly, thanks for providing this framework. I am following the Fineweb pipeline, using Nanotron for training and Lighteval for evaluation, but met some problems.
2. Following run_evals_nanotron.py, I launch the task with:torchrun --nproc-per-node 1 lighteval/run_evals_nanotron.py --checkpoint-config-path nanotron/examples/config_tiny_llama.yaml --lighteval-override lighteval/examples/nanotron/lighteval_config_override_template.yaml
Then I got the error:
It seems that there's something to do with the format of .yaml of both lighteval and nanotron-model. Can you provide an example just like you do with accelerate? Thank you.
I solved this error by modifying lighteval_config_override_template.yaml, deleting the first line and commenting the recompute_granularity: null. But I wonder why this works, Is this a bug?
Firstly, thanks for providing this framework. I am following the Fineweb pipeline, using Nanotron for training and Lighteval for evaluation, but met some problems.
1. Following the tiny_llama demo from nanotron, I got one of the checkpoints like this:
2. Following run_evals_nanotron.py, I launch the task with:
torchrun --nproc-per-node 1 lighteval/run_evals_nanotron.py --checkpoint-config-path nanotron/examples/config_tiny_llama.yaml --lighteval-override lighteval/examples/nanotron/lighteval_config_override_template.yaml
Then I got the error:
It seems that there's something to do with the format of .yaml of both lighteval and nanotron-model. Can you provide an example just like you do with accelerate? Thank you.