Closed codingchild2424 closed 6 months ago
Hello. Could you try again with the current main branch? I think we've just fixed it
ah yes. we have a new fix. not merged yet https://github.com/huggingface/nanotron/pull/120
Check this commit. The assertion can't be all
, as you are expecting to have 1 data stage starting at 1.
Thanks:)
Thank you for sharing the amazing repo to community.
When I try to use examples, this error occured. How to set up for tutorial code?
[My script] torchrun --nproc_per_node=2 run_train.py --config-file examples/config_tiny_llama.yaml
[Error Logs] Traceback (most recent call last): File "/root/nanotron/run_train.py", line 156, in
trainer = DistributedTrainer(config_file)
File "/root/nanotron/src/nanotron/trainer.py", line 127, in init
self.config = get_config_from_file(
File "/root/nanotron/src/nanotron/config/config.py", line 432, in get_config_from_file
config = get_config_from_dict(
File "/root/nanotron/src/nanotron/config/config.py", line 393, in get_config_from_dict
return from_dict(
File "/usr/local/lib/python3.10/dist-packages/dacite/core.py", line 81, in from_dict
instance = data_class(**init_values)
File "", line 14, in init
File "/root/nanotron/src/nanotron/config/config.py", line 336, in __post_init__
assert all(
AssertionError: You must have a training stage starting at 1 in the config's data_stages