Closed FlatMapIO closed 10 months ago
@FlatMapIO Sadly known limitation - I forgot to write it as a limitation and I'm working on it.
The issue is the max CUDA threads is 2^16 or 65536, so the vocab_size for Deepseek is a bit larger (ie 102400) 102400's next power of 2 will become 2^17, which is 131072, and will not work on the current Unsloth implementation.
I'll have to rewrite the cross_entropy_loss
code to support splitting the calculation into multiple grids, then do a final reduction step for logsumexp
.
If this is a more popular request - I will implement it!! So more likes and stars would be helpful!!
@danielhanchen 100% worth the effort! Deepseek-Coder-33B is an amazing coder. And then there is Deepseek-LLM-67B, which is even much better (although twice as big) than that. NO OTHER open model comes close to these two at coding tasks. In fact maybe only GPT-4 is better at such tasks, but it's knowledge cut off is from some time ago so it's less useful with some newer APIs. So fine-tuning those two Deepseek models on one's code base would be a game changer. Another game changer would be even faster inference than what exllamaV2 can achieve, but that's another topic :))
@FlatMapIO Ohh ok ok I'll move this up the priority stack!!
I used this acceleration solution after converting qwen to llama format. The device is 3090. This seems to be the case again in 2024.1.
@AIlaowong I haven't gotten around to supporting larger vocab sizes - 2024.1
still only supports 2^16 (65536) max. I'll probably work on it until I get DPO and other stuff resolved first.
So what I can do temporarily since I can see Qwen specifically (and that means Deepseek) is to temporarily use Pytorch's CrossEntropyLoss
if the vocab size exceeds 2^16, and for a future release I'll implement larger vocab sizes. What do you all think?
That's up to you! I'm genuinely excited to see the results of your updates. It's truly uplifting to know that the training speed of LLMs is being improved.
Added prelim support for ALL kernel sizes! Ie Qwen (llamified), Deepseek etc are all supported now!
Closing now since it's supported for now!
Env:
Unsloth - Alpaca.ipynb
Traceback: