Closed Sunt-ing closed 5 months ago
Thanks for your attention to our EE project. However, Huggingface is currently unable to run the EE models, because the model architecture and inference method of EE models are different from standard GPT models supported by Huggingface.
Adding the EE architecture and inference method to Huggingface is a complex project and will take a lot of time. If you only have a single GPU, you can try converting our provided checkpoints into TP=1 PP=1
and using EE-LLM inference.
Thanks for your advise, I will try that. BTW, do you know other open ee checkpoints?
On Tue, May 28, 2024, 15:29 Xuchen Pan @.***> wrote:
Thanks for your attention to our EE project. However, Huggingface is currently unable to run the EE models, because the model architecture and inference method of EE models are different from standard GPT models supported by Huggingface. Adding the EE architecture and inference method to Huggingface is a complex project and will take a lot of time. If you only have a single GPU, you can try converting our provided checkpoints into TP=1 PP=1 and using EE-LLM inference.
— Reply to this email directly, view it on GitHub https://github.com/pan-x-c/EE-LLM/issues/10#issuecomment-2134635727, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKIL3E6A7QLLIOCZQFKSYXTZEQ56BAVCNFSM6AAAAABIMLEGSGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZUGYZTKNZSG4 . You are receiving this because you authored the thread.Message ID: @.***>
Hi! Thanks for your questions.
There is indeed an open multi-exit checkpoint on HuggingFace, although the model scale is much smaller than ours, and that work introduced multiple exits for other purposes rather than accelerating inference.
And another early-exit work that shares fine-tuned checkpoints.
Thank you for the information.
On Tue, May 28, 2024, 16:34 yanxi-chen @.***> wrote:
Hi! Thanks for your questions.
There is indeed an open multi-exit checkpoint on HuggingFace https://huggingface.co/ibm/gpt2-medium-multiexit, although the model scale is much smaller than ours, and that work introduced multiple exits for other purposes rather than accelerating inference.
— Reply to this email directly, view it on GitHub https://github.com/pan-x-c/EE-LLM/issues/10#issuecomment-2134766008, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKIL3E2URSAO6HTBZOOMNXTZERFRXAVCNFSM6AAAAABIMLEGSGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMZUG43DMMBQHA . You are receiving this because you authored the thread.Message ID: @.***>
Your question Hi, thanks for providing the model checkpoint, it's really useful for me and others. It's the only open decoder-only checkpoint optimized for early exit as far as I know.
Somehow I cannot use Megatron-LM but only Huggingface Transformers. So, how can I run early-exit inference (not necessarily using pipeline or recomputation optimizations, maybe just using fixed depths for a whole prompt) on this model checkpoint in Huggingface Transformer with a single GPU?
Thanks for your help!