huggingface / nanotron

Minimalistic large language model 3D-parallelism training
Apache License 2.0
1.14k stars 107 forks source link

'LlamaModel' object has no attribute 'get_named_params_without_weight_decay' in the beginner example #146

Closed XinDongol closed 5 months ago

XinDongol commented 5 months ago

Got this error when trying to reproduce the example of sh examples/train_tiny_llama.sh

[default1]:Traceback (most recent call last):
[default1]:  File "/raid/xind/nanotron/run_train.py", line 157, in <module>
[default1]:    trainer = DistributedTrainer(config_file)
[default1]:  File "/raid/xind/nanotron/src/nanotron/trainer.py", line 181, in __init__
[default1]:    self.optimizer, self.grad_accumulator = init_optimizer_and_grad_accumulator(
[default1]:  File "/raid/xind/nanotron/src/nanotron/helpers.py", line 311, in init_optimizer_and_grad_accumulator
[default1]:    named_param_groups_with_weight_decay = get_custom_weight_decay_for_named_parameters(
[default1]:  File "/raid/xind/nanotron/src/nanotron/helpers.py", line 192, in get_custom_weight_decay_for_named_parameters
[default1]:04/22/2024 16:42:44 [INFO|DP=1|PP=0|TP=0]: No checkpoint path provided.
[default1]:    exclude_named_params = model.model.get_named_params_without_weight_decay()
[default1]:  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1687, in __getattr__
[default1]:    raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
[default1]:AttributeError: 'LlamaModel' object has no attribute 'get_named_params_without_weight_decay'
zzhhjjj commented 5 months ago

Thank you for reporting the issue. I think this problem has been resolved.

xrsrke commented 5 months ago

Hello. We have fixed it. Could you git pull the main branch again

XinDongol commented 5 months ago

thanks!