Closed XinDongol closed 5 months ago
Got this error when trying to reproduce the example of sh examples/train_tiny_llama.sh
sh examples/train_tiny_llama.sh
[default1]:Traceback (most recent call last): [default1]: File "/raid/xind/nanotron/run_train.py", line 157, in <module> [default1]: trainer = DistributedTrainer(config_file) [default1]: File "/raid/xind/nanotron/src/nanotron/trainer.py", line 181, in __init__ [default1]: self.optimizer, self.grad_accumulator = init_optimizer_and_grad_accumulator( [default1]: File "/raid/xind/nanotron/src/nanotron/helpers.py", line 311, in init_optimizer_and_grad_accumulator [default1]: named_param_groups_with_weight_decay = get_custom_weight_decay_for_named_parameters( [default1]: File "/raid/xind/nanotron/src/nanotron/helpers.py", line 192, in get_custom_weight_decay_for_named_parameters [default1]:04/22/2024 16:42:44 [INFO|DP=1|PP=0|TP=0]: No checkpoint path provided. [default1]: exclude_named_params = model.model.get_named_params_without_weight_decay() [default1]: File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1687, in __getattr__ [default1]: raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'") [default1]:AttributeError: 'LlamaModel' object has no attribute 'get_named_params_without_weight_decay'
Thank you for reporting the issue. I think this problem has been resolved.
Hello. We have fixed it. Could you git pull the main branch again
thanks!
Got this error when trying to reproduce the example of
sh examples/train_tiny_llama.sh