unslothai / unsloth

Finetune Llama 3.2, Mistral, Phi, Qwen & Gemma LLMs 2-5x faster with 80% less memory
https://unsloth.ai
Apache License 2.0
17.97k stars 1.25k forks source link

Feature request: discussions around new features within HF ecosystem with unsloth #34

Closed younesbelkada closed 1 month ago

younesbelkada commented 11 months ago

Hi @danielhanchen

Thank you very much for this great project and pushing this forward for the community !

With TRL / PEFT team we've seen that your example scripts heavily rely on PEFT / TRL libraries and we wanted to see if you need any help or have any feature request around HF ecosystem we would be happy to collaborate and see what we can do together

Note also recently SDPA has been integrated into transformers core https://github.com/huggingface/transformers/pull/26572 we were also wondering if you did some comparisons with unsloth against transformers 4.36.0

cc @pacman100 @lvwerra

danielhanchen commented 11 months ago

@younesbelkada Hey there! Saw many of ur PRs for HF - so great work again! So I actually saw SDPA support and I think I wrote a note in my benchmarks.

For eg: Alpaca with SDPA on Tesla T4 ie:

%%capture
# scaled_dot_product_attention added in 9th December 2023
# Supports Xformers, FA on old GPUs now (T4 for eg)
# But only for Pytorch 2.1.1+ We shall patch it ourselves for now
!pip install transformers bitsandbytes datasets sentencepiece accelerate trl peft

I manually patched them for SDPA, and so on Tesla T4s, I did in fact benchmark SDPA (not native transformers, but just SDPA).

Eg (The Flash Attention column is in fact SDPA) 1 T4 16GB Hugging Face Flash Attention Unsloth Open Unsloth Pro Equal Unsloth Pro Unsloth Max
Alpaca 1x 1.09x 1.69x 1.79x 2.93x 8.3x
code Code Code Code Code
seconds 1599 1468 942 894 545 193
memory MB 7199 7059 6459 5443
memory saved % 1.94 10.28 24.39

So vs SDPA, Unsloth is 1.56x faster on a Tesla T4. I wanted to actually use the latest transformers branch, but Colab's Pytorch is 2.1.0, and upgrading it to 2.1.1 would be quite slow - I started benchmarking Dec 8 ish, then HF released SDPA support I think Dec 9?

danielhanchen commented 11 months ago

But more than happy to collaborate on anything!! Again great work with TRL and PEFT! I'm actively following https://github.com/huggingface/transformers/pull/26037 :) so that'll be massive for the next HF release!

I'm also investigating LoftQ via PEFT as suggested by someone I was chatting with - I haven't tried it yet, but hopefully VRAM doesn't explode!

younesbelkada commented 11 months ago

Thanks very much for your positive reply @danielhanchen ! We can collaborate on many things, one thing I had in mind is to integrate an API that can leverage unsloth as backend on PEFT. It will be easier for us if we can discuss about it on slack, can you send me an email address I can reach you out at?

danielhanchen commented 11 months ago

@younesbelkada Email is on my profile! :)