intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
Apache License 2.0
6.75k stars 1.27k forks source link

nf4 still unsupported? #12427

Open epage480 opened 4 days ago

epage480 commented 4 days ago

In the example: example/CPU/QLoRA-FineTuning/qlora_finetuning_cpu.py

It mentions on a comment that nf4 is not supported on cpu yet but when I change the example from int4 -> nf4 it still runs without errors or warnings related to nf4.

Is nf4 now supported? Otherwise if it is defaulting back to int4 I think it's worth printing an error or warning.

bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=False,
        bnb_4bit_quant_type="int4",  # nf4 not supported on cpu yet
        bnb_4bit_compute_dtype=torch.bfloat16
    )
Uxito-Ada commented 4 days ago

Hi @epage480 ,

Thanks for your validation. Yes, NF4 is currently supported.

The CPU QLoRA example uses the quantization backend of bitsandbytes, which has already enabled NF4 on Intel 4th Gen Xeon (SPR) platform as shown here.