intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.26k stars 1.23k forks source link

[integration]: merging bfloat16 model failed #11135

Open raj-ritu17 opened 1 month ago

raj-ritu17 commented 1 month ago

base-model: Weyaxi/Dolphin2.1-OpenOrca-7B

Scenario:

(ft_Qlora) intel@imu-nex-sprx92-max1-sut:~/ritu/ipex-llm/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora$ python ./export_merged_model.py --repo-id-or-model-path Weyaxi/Dolphin2.1-OpenOrca-7B --adapter_path ./out-dir-FT/tmp-checkpoint-1400/ --output_path ./out-dir-FT/tmp-checkpoint-1400-merged
/home/intel/miniconda3/envs/ft_Qlora/lib/python3.11/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension: ''If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
2024-05-24 14:37:05,078 - INFO - intel_extension_for_pytorch auto imported
2024-05-24 14:37:05,084 - WARNING - The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
/home/intel/miniconda3/envs/ft_Qlora/lib/python3.11/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
2024-05-24 14:37:05,713 - ERROR -

****************************Usage Error************************
Please use torch_dtype=torch.bfloat16 when setting load_in_low_bit='bf16'.
2024-05-24 14:37:05,713 - ERROR -

****************************Call Stack*************************
Failed to merge the adapter, error: Please use torch_dtype=torch.bfloat16 when setting load_in_low_bit='bf16'..

what else tried: added 'torch_dtype=torch.bfloat16' in utils code (in function -> merge_adapter) for e.g. --> common/utils/util.py +183

    try:
        base_model = AutoModelForCausalLM.from_pretrained(
            base_model,
            #load_in_low_bit="nf4", # should load the orignal model
            #torch_dtype=torch.float16,
            #ritu: added for DolphinOrca-7b
            **torch_dtype=torch.bfloat16**,
            #end
            device_map={"": "cpu"},
        )

        lora_model = PeftModel.from_pretrained(
            base_model,
            adapter_path,
            device_map={"": "cpu"},
            #torch_dtype=torch.float16,
            **torch_dtype=torch.bfloat16,**
        )

this doesn't solve the issue and gives an empty error.

Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 11.78it/s]
2024-05-24 10:35:22,564 - INFO - Converting the current model to bf16 format......
[2024-05-24 10:35:22,912] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to xpu (auto detect)
Failed to merge the adapter, error: .
plusbang commented 1 month ago

Hi, @raj-ritu17 , I have reproduced error during merging model. We will try to fix it, update here once it is solved.

plusbang commented 1 month ago

Hi, @raj-ritu17 . We have fixed this bug. Please install the latest ipex-llm (2.1.0b20240527), no need to modify utils code and just run this script to merge model.

According to my local experiment, this merging process works and you could use this merged model do inference following https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mistral