Improving the Implementation of Phi3SuScaledRotaryEmbedding to Reduce Unnecessary Computation
I'm not entirely sure if there is a deeper meaning to the implementation here. It seems that the inv_freq could be completely initialized in the __init__method as long_inv_freqand short_inv_freq.
The only parameter used here is the part that gets the device.
Fixing this code can eliminate a redundant computation. Moreover, self.inv_freq used here doesn't seem appropriate to be a attribute.
I tested the modified version locally. Although I couldn't test with long contexts due to my device's performance, the new implementation doesn't seem to show any issues with short sentences.
Feature request
Improving the Implementation of Phi3SuScaledRotaryEmbedding to Reduce Unnecessary Computation
I'm not entirely sure if there is a deeper meaning to the implementation here. It seems that the inv_freq could be completely initialized in the
__init__
method aslong_inv_freq
andshort_inv_freq
. The only parameter used here is the part that gets the device.https://github.com/huggingface/transformers/blob/25245ec26dc29bcf6102e1b4ddd0dfd02e720cf5/src/transformers/models/phi3/modeling_phi3.py#L131-L137
Motivation
Fixing this code can eliminate a redundant computation. Moreover, self.inv_freq used here doesn't seem appropriate to be a attribute.
I tested the modified version locally. Although I couldn't test with long contexts due to my device's performance, the new implementation doesn't seem to show any issues with short sentences.
Your contribution
I can fix it if this isn't intentional.