huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
131.69k stars 26.22k forks source link

Implementation Issue of Phi3SuScaledRotaryEmbedding #31339

Open ryan-minato opened 2 months ago

ryan-minato commented 2 months ago

Feature request

Improving the Implementation of Phi3SuScaledRotaryEmbedding to Reduce Unnecessary Computation

I'm not entirely sure if there is a deeper meaning to the implementation here. It seems that the inv_freq could be completely initialized in the __init__method as long_inv_freqand short_inv_freq. The only parameter used here is the part that gets the device.

https://github.com/huggingface/transformers/blob/25245ec26dc29bcf6102e1b4ddd0dfd02e720cf5/src/transformers/models/phi3/modeling_phi3.py#L131-L137

Motivation

Fixing this code can eliminate a redundant computation. Moreover, self.inv_freq used here doesn't seem appropriate to be a attribute.

I tested the modified version locally. Although I couldn't test with long contexts due to my device's performance, the new implementation doesn't seem to show any issues with short sentences.

Your contribution

I can fix it if this isn't intentional.

amyeroberts commented 2 months ago

cc @ArthurZucker @gante

ArthurZucker commented 2 months ago

PR reviewed! 🤗