intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.29k stars 1.23k forks source link

When convert MiniCPM-V model from ModelScope to low-bit, got error: AttributeError: 'NoneType' object has no attribute 'add_bos_token' #11390

Closed lei-sun-intel closed 1 week ago

lei-sun-intel commented 3 weeks ago
  1. Download MiniCPM-V model from ModelScope
  2. Convert the mode to low-bit by the command in GPU/ModelScope-Models/Save-Load as follows. python ./generate.py --repo-id-or-model-path ./models/OpenBMB/MiniCPM-V --save-path ./models/OpenBMB/MiniCPM-V-int4

2024-06-21 16:26:41,934 - INFO - intel_extension_for_pytorch auto imported 2024-06-21 16:26:42,721 - INFO - Note: NumExpr detected 22 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. 2024-06-21 16:26:42,721 - INFO - NumExpr defaulting to 8 threads. 2024-06-21 16:26:43,577 - modelscope - INFO - PyTorch version 2.1.0.post0+cxx11.abi Found. 2024-06-21 16:26:43,577 - modelscope - INFO - Loading ast index from /home/test10/.cache/modelscope/ast_indexer 2024-06-21 16:26:43,649 - modelscope - INFO - Loading done! Current index file version is 1.11.0, with md5 a8fc9e89d1b75747da2b763359929bfe and a total number of 953 components indexed Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 8.98it/s] 2024-06-21 16:26:44,791 - INFO - Converting the current model to sym_int4 format...... WARNING: Ignoring invalid distribution -orch (/home/test10/.miniconda_dev_zone/envs/notebook-zone/lib/python3.9/site-packages) Model and tokenizer are saved to ./models/OpenBMB/MiniCPM-V-int4 Traceback (most recent call last): File "/home/test10/ipex-llm/python/llm/example/GPU/ModelScope-Models/Save-Load/./generate.py", line 66, in output = model.generate(input_ids, File "/home/test10/.cache/huggingface/modules/transformers_modules/MiniCPM-V/modeling_minicpmv.py", line 211, in generate model_inputs = self._process_list(tokenizer, data_list, max_inp_length) File "/home/test10/.cache/huggingface/modules/transformers_modules/MiniCPM-V/modeling_minicpmv.py", line 167, in _process_list input_tensors.append(self._convert_to_tensors(tokenizer, data, max_inp_length)) File "/home/test10/.cache/huggingface/modules/transformers_modules/MiniCPM-V/modeling_minicpmv.py", line 138, in _convert_to_tensors if tokenizer.add_bos_token: AttributeError: 'NoneType' object has no attribute 'add_bos_token'

Python 3.9.19 Model: MiniCPM-V https://modelscope.cn/models/OpenBMB/MiniCPM-V

qiuxin2012 commented 3 weeks ago

It looks like transformers' version is not match, it requires 4.36, which version are you using? Please change to the right version.

lei-sun-intel commented 3 weeks ago

pip list |grep transformers shows we are working on transformers 4.37.0, I will try 4.36. Thanks a lot for your quick reply

lei-sun-intel commented 3 weeks ago

after pip install transformers==4.36.0 or 4.36.2, I got the same error, no change at all.

qiuxin2012 commented 3 weeks ago

I just try this model, and get a different error:

2024-06-24 14:13:06,027 - INFO - Converting the current model to sym_int4 format......
torch.Size([3, 448, 448])
Traceback (most recent call last):
  File "C:\Users\arda\xin\test.py", line 21, in <module>
    res, context, _ = model.chat(
                      ^^^^^^^^^^^
  File "C:\Users\arda\.cache\huggingface\modules\transformers_modules\MiniCPM-V\modeling_minicpmv.py", line 279, in chat
    res, vision_hidden_states = self.generate(
                                ^^^^^^^^^^^^^^
  File "C:\Users\arda\.cache\huggingface\modules\transformers_modules\MiniCPM-V\modeling_minicpmv.py", line 230, in generate
    model_inputs['inputs_embeds'], vision_hidden_states = self.get_vllm_embedding(model_inputs)
                                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\arda\.cache\huggingface\modules\transformers_modules\MiniCPM-V\modeling_minicpmv.py", line 87, in get_vllm_embedding
    vision_hidden_states.append(self.get_vision_embedding(pixel_values))
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\arda\.cache\huggingface\modules\transformers_modules\MiniCPM-V\modeling_minicpmv.py", line 75, in get_vision_embedding
    vision_embedding = self.vpm.forward_features(pixel_value.unsqueeze(0).type(dtype))
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\arda\miniforge3\envs\xin-llm\Lib\site-packages\timm\models\vision_transformer.py", line 663, in forward_features
    x = self._pos_embed(x)
        ^^^^^^^^^^^^^^^^^^
  File "C:\Users\arda\miniforge3\envs\xin-llm\Lib\site-packages\timm\models\vision_transformer.py", line 582, in _pos_embed
    pos_embed = resample_abs_pos_embed(
                ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\arda\miniforge3\envs\xin-llm\Lib\site-packages\timm\layers\pos_embed.py", line 46, in resample_abs_pos_embed
    posemb = F.interpolate(posemb, size=new_size, mode=interpolation, antialias=antialias)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\arda\miniforge3\envs\xin-llm\Lib\site-packages\torch\nn\functional.py", line 4027, in interpolate
    return torch._C._nn._upsample_bicubic2d_aa(input, output_size, align_corners, scale_factors)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
NotImplementedError: Could not run 'aten::_upsample_bicubic2d_aa.out' with arguments from the 'XPU' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_upsample_bicubic2d_aa.out' is only available for these backends: [CPU, Meta, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradMeta, AutogradNestedTensor, Tracer, AutocastCPU, AutocastXPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PreDispatch, PythonDispatcher].

It shows timm could't run on xpu.

lei-sun-intel commented 3 weeks ago

patch the file as follows: posemb = F.interpolate(posemb, size=new_size, mode=interpolation, antialias=antialias) -> posemb = F.interpolate(posemb.to("cpu"), size=new_size, mode=interpolation, antialias=antialias).to(posemb.device) can fix your problem.

qiuxin2012 commented 3 weeks ago

Yes, you can follow https://github.com/intel-analytics/ipex-llm/issues/10470. I just run the generation successfully.

lei-sun-intel commented 3 weeks ago

Would you please help check the version of bigdl-llm? I just tried bigdl-llm==2.4.0, and it does NOT work either. In my env, $ pip list |grep bigdl shows as follows. bigdl-core-xe-21 2.5.0b20240610 bigdl-core-xe-addons-21 2.5.0b20240610 bigdl-core-xe-batch-21 2.5.0b20240610 bigdl-core-xe-esimd-21 2.5.0b20240423 bigdl-llm 2.4.0

lei-sun-intel commented 3 weeks ago

I checked https://github.com/intel-analytics/ipex-llm/issues/10470, bigdl-llm==2.5.0b20240318, I will have a try.

lei-sun-intel commented 3 weeks ago

Can I do it with ipex--llm instead of bigdl-llm? Because I find I have NOT installed bigdl-llm.

qiuxin2012 commented 2 weeks ago

Please don't use bigdl-llm, bigdl-llm has now become ipex-llm (see the migration guide here) My python script

import torch
from PIL import Image
from ipex_llm.transformers import AutoModel
from transformers import AutoTokenizer
path = "D:\\llm-models\\MiniCPM-V"
print(path)
model = AutoModel.from_pretrained(path, 
                                  load_in_4bit=True,
                                  optimize_model=False,
                                  trust_remote_code=True,
                                  modules_to_not_convert=["vpm", "resampler"],
                                  use_cache=True)
model = model.float().to(device='xpu')
tokenizer = AutoTokenizer.from_pretrained(path,
                                          trust_remote_code=True)
model.eval()

image = Image.open("C:\\Users\\arda\Desktop\\tiger.jpeg").convert('RGB')
question = 'What is in the image?'
msgs = [{'role': 'user', 'content': question}]

res, context, _ = model.chat(
    image=image,
    msgs=msgs,
    context=None,
    tokenizer=tokenizer,
    sampling=True,
    temperature=0.7
)
print(res)
lei-sun-intel commented 2 weeks ago

I want to convert the model to int4, Any update?

qiuxin2012 commented 2 weeks ago
model = AutoModel.from_pretrained(path, 
                                  load_in_4bit=True,
                                  optimize_model=False,
                                  trust_remote_code=True,
                                  modules_to_not_convert=["vpm", "resampler"],
                                  use_cache=True)

With upon code, it's already loading into int4.

qiuxin2012 commented 1 week ago

@lei-sun-intel You error message is throw during the generation. The model should be saved already. Traceback (most recent call last): File "/home/test10/ipex-llm/python/llm/example/GPU/ModelScope-Models/Save-Load/./generate.py", line 66, in output = model.generate(input_ids, File "/home/test10/.cache/huggingface/modules/transformers_modules/MiniCPM-V/modeling_minicpmv.py", line 211, in generate model_inputs = self._process_list(tokenizer, data_list, max_inp_length) File "/home/test10/.cache/huggingface/modules/transformers_modules/MiniCPM-V/modeling_minicpmv.py", line 167, in _process_list input_tensors.append(self._convert_to_tensors(tokenizer, data, max_inp_length)) File "/home/test10/.cache/huggingface/modules/transformers_modules/MiniCPM-V/modeling_minicpmv.py", line 138, in _convert_to_tensors if tokenizer.add_bos_token: AttributeError: 'NoneType' object has no attribute 'add_bos_token'

Tokenizer has attribute no add_bos_token, it's an error for transformers version mismatch. But you said you have installed the right version. I think it's something wrong in your environment. My suggestion: you can create a new conda environment and try my code above.

lei-sun-intel commented 1 week ago

path = "./models/OpenBMB/MiniCPM-V" save_path = "./models/OpenBMB/MiniCPM-V-int4"

model = AutoModel.from_pretrained(path, load_in_4bit=True, optimize_model=False, trust_remote_code=True, modules_to_not_convert=["vpm", "resampler"], use_cache=True) model = model.float().to(device='xpu') tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True) model.eval()

model.save_low_bit(save_path) tokenizer.save_pretrained(save_path) print(f"Model and tokenizer are saved to {save_path}")

Finally, the above code fix the problem. Thanks a lot!