Closed younesbelkada closed 1 month ago
@younesbelkada Hey there! Saw many of ur PRs for HF - so great work again! So I actually saw SDPA support and I think I wrote a note in my benchmarks.
For eg: Alpaca with SDPA on Tesla T4 ie:
%%capture
# scaled_dot_product_attention added in 9th December 2023
# Supports Xformers, FA on old GPUs now (T4 for eg)
# But only for Pytorch 2.1.1+ We shall patch it ourselves for now
!pip install transformers bitsandbytes datasets sentencepiece accelerate trl peft
I manually patched them for SDPA, and so on Tesla T4s, I did in fact benchmark SDPA (not native transformers, but just SDPA).
Eg (The Flash Attention column is in fact SDPA) | 1 T4 16GB | Hugging Face | Flash Attention | Unsloth Open | Unsloth Pro Equal | Unsloth Pro | Unsloth Max |
---|---|---|---|---|---|---|---|
Alpaca | 1x | 1.09x | 1.69x | 1.79x | 2.93x | 8.3x | |
code | Code | Code | Code | Code | |||
seconds | 1599 | 1468 | 942 | 894 | 545 | 193 | |
memory MB | 7199 | 7059 | 6459 | 5443 | |||
memory saved % | 1.94 | 10.28 | 24.39 |
So vs SDPA, Unsloth is 1.56x faster on a Tesla T4. I wanted to actually use the latest transformers branch, but Colab's Pytorch is 2.1.0, and upgrading it to 2.1.1 would be quite slow - I started benchmarking Dec 8 ish, then HF released SDPA support I think Dec 9?
But more than happy to collaborate on anything!! Again great work with TRL and PEFT! I'm actively following https://github.com/huggingface/transformers/pull/26037 :) so that'll be massive for the next HF release!
I'm also investigating LoftQ via PEFT as suggested by someone I was chatting with - I haven't tried it yet, but hopefully VRAM doesn't explode!
Thanks very much for your positive reply @danielhanchen ! We can collaborate on many things, one thing I had in mind is to integrate an API that can leverage unsloth as backend on PEFT. It will be easier for us if we can discuss about it on slack, can you send me an email address I can reach you out at?
@younesbelkada Email is on my profile! :)
Hi @danielhanchen
Thank you very much for this great project and pushing this forward for the community !
With TRL / PEFT team we've seen that your example scripts heavily rely on PEFT / TRL libraries and we wanted to see if you need any help or have any feature request around HF ecosystem we would be happy to collaborate and see what we can do together
Note also recently SDPA has been integrated into transformers core https://github.com/huggingface/transformers/pull/26572 we were also wondering if you did some comparisons with unsloth against transformers 4.36.0
cc @pacman100 @lvwerra