huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.7k stars 26.94k forks source link

Plans to Integrate LongRoPE into LLaMA? #31992

Open ryan-minato opened 3 months ago

ryan-minato commented 3 months ago

Feature request

Microsoft has introduced their microsoft/LongRoPE implementation. Unlike plug-and-play solutions, LongRoPE requires hyperparameter tuning via a genetic algorithm. This implementation is likely the same as described in the Su on Phi-3. Are there any plans to incorporate LongRoPE into LLaMA?

Motivation

In my research on long content, I have managed to integrate LongRoPE into LLaMA with some minor code adjustments. I am curious if Huggingface is also working on integrating this feature.

Your contribution

If necessary.

amyeroberts commented 3 months ago

cc @ArthurZucker @gante

gante commented 3 months ago

Hey @ryan-minato 👋

Thank you for opening this issue! We're pausing new rope scaling contributions for a week or so, while we refactor the code. Past that, we'd love to get a contribution 🤗

ArthurZucker commented 3 months ago

Will be fixed by #31999

gante commented 3 months ago

@ryan-minato #31999 will include longrope 🤗