Adding support 8bit quantization.

unslothai / unsloth

Finetune Llama 3, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory

https://unsloth.ai

Apache License 2.0

12.64k stars 821 forks source link

Adding support 8bit quantization. #132

Open Darren80 opened 5 months ago

Darren80 commented 5 months ago

Adding some support for 8bit quantization would be a good idea because it can fill the gap for people with less GPU VRAM to work with, So I think it would be a good idea if possible. thank you.

danielhanchen commented 5 months ago

Great idea! If this gets more upvotes as usual, it'll signal I definitely have to add it to my roadmap :)) Since we're still just 2 brothers, I'll see what I can do if I have bandwidth :)

Darren80 commented 5 months ago

No worries mate.

JhonDan1999 commented 3 months ago

@danielhanchen When I set 8bit=True in my code,

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = model_name, # Choose ANY! eg teknium/OpenHermes-2.5-Mistral-7B
    load_in_8bit=True,
    load_in_4bit=False,

I encountered the following error:

RuntimeError: expected mat1 and mat2 to have the same dtype, but got: c10::Half != signed char

if this is because 8bit is not supported I would kindly request you to consider adding 8-bit quantization