huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.38k stars 27.09k forks source link

Mismatched keyword argument names of llama make GA fix invalid #34577

Open hiyouga opened 3 weeks ago

hiyouga commented 3 weeks ago

System Info

Who can help?

@ArthurZucker @muellerzr

Information

Tasks

Reproduction

https://github.com/huggingface/transformers/pull/33932 may breaks the logic for the trainer's model_accepts_loss_kwargs. The llama model would not receive a num_items_in_batch argument, making the fix of https://github.com/huggingface/transformers/pull/34283 invalid again

https://github.com/huggingface/transformers/blob/33868a057c02f0368ba63bd1edb746be38fe3d90/src/transformers/trainer.py#L605

Moreover, the names of keyword arguments are different for llama and other models, we might expect the same keyword arguments for different models.

https://github.com/huggingface/transformers/blob/33868a057c02f0368ba63bd1edb746be38fe3d90/src/transformers/models/llama/modeling_llama.py#L1146-L1161

https://github.com/huggingface/transformers/blob/33868a057c02f0368ba63bd1edb746be38fe3d90/src/transformers/models/gemma/modeling_gemma.py#L1015-L1030

Expected behavior

The models' forward functions should have a consistent keyword argument list.

ArthurZucker commented 1 week ago

cc @muellerzr I think when reviewing I mentioned something about checking the type rather than names (like start kwarg type) would easily fix this!