Support for FP8 quantization

unslothai / unsloth

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory

https://unsloth.ai

Apache License 2.0

15.18k stars 1.01k forks source link

Support for FP8 quantization #776

Open rwl4 opened 1 month ago

rwl4 commented 1 month ago

With the release of the new Mistral NeMo 12B model we now have weights that were pre-trained with FP8. It would be great if Unsloth could support 8bit as well as the existing 4bit training so we could do training without any quantization related loss.

RaccoonOnion commented 1 month ago

Second here. It would be great if Unsloth can release a 4bit quantized NeMo model.

danielhanchen commented 1 month ago

Apologies on the delay just did! https://huggingface.co/unsloth/Mistral-Nemo-Base-2407-bnb-4bit