fix the following error when running _test_rotary():
"""
...
x_rope = (x_rope self.cos_cached[:x.shape[0]]) + (neg_half_x self.sin_cached[:x.shape[0]])
RuntimeError: The size of tensor a (3) must match the size of tensor b (4) at non-singleton dimension 3
"""
fix the following error when running _test_rotary():
""" ... x_rope = (x_rope self.cos_cached[:x.shape[0]]) + (neg_half_x self.sin_cached[:x.shape[0]]) RuntimeError: The size of tensor a (3) must match the size of tensor b (4) at non-singleton dimension 3 """