intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.55k stars 1.25k forks source link

error with ipex-llm langchain integration for LLAVA model #11341

Open tsantra opened 3 months ago

tsantra commented 3 months ago

Hi, I saved the LLAVA model in 4bit using generate.py from: https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/PyTorch-Models/Model/llava

model = optimize_model(model)

Added these lines below in the generate.py

if SAVE_PATH : model.save_low_bit(save_path_model) tokenizer.save_pretrained(save_path_model) print(f"Model and tokenizer are saved to {save_path_model}")

Then used the low bit saved model in Langchain.

from ipex_llm.langchain.llms import TransformersLLM model_path = <>

print (" loading IPEX_LLM_Model") ipex_llm_model = TransformersLLM.from_model_id( model_id=model_path, model_kwargs={"temperature": 0, "max_length": 1024, "trust_remote_code": True}, )

I am getting this error:

miniforge3/envs/ipex-llm-cpu/lib/python3.11/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32 2024-06-17 18:40:44,748 - WARNING - /home/ceed-user/miniforge3/envs/ipex-llm-cpu/lib/python3.11/site-packages/langchain/_api/module_import.py:92: LangChainDeprecationWarning: Importing enforce_stop_tokens from langchain.llms is deprecated. Please replace deprecated imports:

from langchain.llms import enforce_stop_tokens

with new imports of:

from langchain_community.llms.utils import enforce_stop_tokens You can use the langchain cli to automatically upgrade many imports. Please see documentation here https://python.langchain.com/v0.2/docs/versions/v0_2/ warn_deprecated(

2024-06-17 18:40:44,750 - WARNING - BigdlNativeLLM has been deprecated, please switch to the new LLM API for sepcific models. loading IPEX_LLM_Model Traceback (most recent call last): File "/home/ceed-user/miniforge3/envs/ipex-llm-cpu/lib/python3.11/site-packages/ipex_llm/langchain/llms/transformersllm.py", line 138, in from_model_id model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ceed-user/miniforge3/envs/ipex-llm-cpu/lib/python3.11/unittest/mock.py", line 1378, in patched return func(*newargs, newkeywargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ceed-user/miniforge3/envs/ipex-llm-cpu/lib/python3.11/site-packages/ipex_llm/transformers/model.py", line 351, in from_pretrained model = cls.load_convert(q_k, optimize_model, *args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ceed-user/miniforge3/envs/ipex-llm-cpu/lib/python3.11/site-packages/ipex_llm/transformers/model.py", line 494, in load_convert model = cls.HF_Model.from_pretrained(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ceed-user/miniforge3/envs/ipex-llm-cpu/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 526, in from_pretrained config, kwargs = AutoConfig.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ceed-user/miniforge3/envs/ipex-llm-cpu/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py", line 1098, in from_pretrained config_class = CONFIG_MAPPING[config_dict["model_type"]]


  File "/home/ceed-user/miniforge3/envs/ipex-llm-cpu/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py", line 795, in __getitem__
    raise KeyError(key)
KeyError: 'llava_llama'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "rag-bigdl-main/rag_multimodal_no_summary.py", line 39, in <module>
    ipex_llm_model = TransformersLLM.from_model_id(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ceed-user/miniforge3/envs/ipex-llm-cpu/lib/python3.11/site-packages/ipex_llm/langchain/llms/transformersllm.py", line 141, in from_model_id
    model = AutoModel.from_pretrained(model_id, load_in_4bit=True, **_model_kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ceed-user/miniforge3/envs/ipex-llm-cpu/lib/python3.11/unittest/mock.py", line 1378, in patched
    return func(*newargs, **newkeywargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ceed-user/miniforge3/envs/ipex-llm-cpu/lib/python3.11/site-packages/ipex_llm/transformers/model.py", line 351, in from_pretrained
    model = cls.load_convert(q_k, optimize_model, *args, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ceed-user/miniforge3/envs/ipex-llm-cpu/lib/python3.11/site-packages/ipex_llm/transformers/model.py", line 494, in load_convert
    model = cls.HF_Model.from_pretrained(*args, **kwargs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ceed-user/miniforge3/envs/ipex-llm-cpu/lib/python3.11/site-packages/transformers/models/auto/auto_factory.py", line 526, in from_pretrained
    config, kwargs = AutoConfig.from_pretrained(
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ceed-user/miniforge3/envs/ipex-llm-cpu/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py", line 1098, in from_pretrained
    config_class = CONFIG_MAPPING[config_dict["model_type"]]
                   ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ceed-user/miniforge3/envs/ipex-llm-cpu/lib/python3.11/site-packages/transformers/models/auto/configuration_auto.py", line 795, in __getitem__
    raise KeyError(key)
KeyError: 'llava_llama'

**_What should be the right way to use the saved 4bit LLAVA model in Langchain?_**
ivy-lv11 commented 3 months ago

@tsantra We are trying to reproduce the error and would you please share the code above :)

ivy-lv11 commented 3 months ago

As ipex_llm.langchain.llms.TransformersLLM does not support load LLaVA directly, you could follow these steps instead.

  1. Please following LLaVA repo guide to load the model
  2. Optimize the model by using ipex_llm.optimize_model() to transform it to low-bit.
  3. Integrate with TransformersLLM by passing it as parameter model so that you could use it in langchain. Example code:
    
    from ipex_llm import optimize_model

Load the pretrained model.

Adapted from LLaVA.llava.model.builder.load_pretrained_model.

def load_pretrained_model(model_path, model_base, model_name, load_8bit=False, load_4bit=False, device_map="auto", device="cpu"):

# Refer LLaVA repo to load model 
......

model = optimize_model(model) from ipex_llm.langchain.llms import TransformersLLM llm = TransformersLLM(model_id="",model=model,tokenizer=tokenizer)

tsantra commented 3 months ago

@ivy-lv11 thank you ! This works!

Finally, I want to use the llm in the RAG pipeline. Using the llm = TransformersLLM(model_id="",model=model,tokenizer=tokenizer), is not generating any output for me. My code uses base64 encoded image as input, as this is the way Ollama works and my solution works with LLaVA model from Ollama. Now I am trying to use Ipex-LLM LLaVA model instead from Ollama. Here is my code:

Create chroma

vectorstore = Chroma( collection_name="mm_rag_clip_photos", embedding_function=OpenCLIPEmbeddings(), persist_directory= "./chroma_test_ipex_llm/" ) vectorstore.persist()

Get image URIs with .jpg extension only

image_uris = sorted( [ os.path.join(output_folder, image_name) for image_name in os.listdir(output_folder) if image_name.endswith(".png") ] )

with open (text_path, 'r') as textfile: texts = textfile.readlines()

Add images

vectorstore.add_images(uris=image_uris)

Add documents

vectorstore.add_texts(texts=texts)

Make retriever

retriever = vectorstore.as_retriever(search_kwargs={"k": 1})

def resize_base64_image(base64_string, size=(128, 128)): """ Resize an image encoded as a Base64 string.

Args:
base64_string (str): Base64 string of the original image.
size (tuple): Desired size of the image as (width, height).

Returns:
str: Base64 string of the resized image.
"""
# Decode the Base64 string
img_data = base64.b64decode(base64_string)
img = Image.open(io.BytesIO(img_data))

# Resize the image
resized_img = img.resize(size, Image.LANCZOS)

# Save the resized image to a bytes buffer
buffered = io.BytesIO()
resized_img.save(buffered, format=img.format)

# Encode the resized image to Base64
return base64.b64encode(buffered.getvalue()).decode("utf-8")

def is_base64(s): """Check if a string is Base64 encoded""" try: return base64.b64encode(base64.b64decode(s)) == s.encode() except Exception: return False

def split_image_text_types(docs): """Split numpy array images and texts""" images = [] text = [] for doc in docs: doc = doc.page_content # Extract Document contents if is_base64(doc): print (" found image doc ")

Resize image to avoid OAI server error

        images.append(
            resize_base64_image(doc, size=(250, 250))
        )  # base64 encoded str
    else:
        text.append(doc)
return {"images": images, "texts": text}

def prompt_func(data_dict):

# Joining the context texts into a single string
formatted_texts = "\n".join(data_dict["context"]["texts"])
messages = []

# Adding image(s) to the messages if present
if data_dict["context"]["images"]:
    for x in data_dict["context"]["images"]:
        image_message = {
            "type": "image_url",
            "image_url": f"data:image/jpeg;base64,{x}",            
        }
        messages.append(image_message)

# Adding the text message for analysis

text_message = {
    "type": "text",
    "text": (
        "\n You are an AI Assistant for summarizing videos.\n" 

    ),
}     

messages.append(text_message)    

return [HumanMessage(content=messages)]

RAG pipeline

chain = ( { "context": retriever | RunnableLambda(split_image_text_types), "question": RunnablePassthrough(), } | RunnableLambda(prompt_func) | llm | StrOutputParser() )

d = chain.invoke("what are the key takeaways from the images?")

Please could you help. How do I format the image and text input for the model?

lzivan commented 3 months ago

Hi, we will try to reproduce your error.

tsantra commented 3 months ago

hi @lzivan could you please let me know if you have any update. Thank you very much.

lzivan commented 3 months ago

Hi @tsantra , we are now debugging the "load_image" issue, will get back to you once we have a solution.

tsantra commented 2 months ago

hi @lzivan Could you please let me know if there is any update? thanks a lot!

lzivan commented 2 months ago

We are still getting "nobody knows" for the output. I'm not sure whether Llava is compatible with Langchain.

lzivan commented 2 months ago

We tried on our RTX machine, using local Llava-1.5-7b model, still get the same "nobody knows" output.

tsantra commented 2 months ago

@lzivan LlaVA works fine with Langchain using Ollama. So LLaVA is compatible.

tsantra commented 2 months ago

@lzivan what is the "nobody knows" output?

shane-huang commented 2 months ago

@lzivan LlaVA works fine with Langchain using Ollama. So LLaVA is compatible.

If (official) Ollama works okay, could you please try use IPEX-LLM enabled Ollama on GPU and see if it works? For how to install IPEX-LLM enabled Ollama and run ollama serve on GPU (e.g. iGPU and Arc), refer to this guide

Ipex-llm enabled Ollama is used the same way as offical ollama, so you don't have to change your langchain code. Just remember to change the base_url if you're running ollama serve on another machine. For example, if you're running Ollama serve on your_machine_ip, set the ollama base_url as below in your langchain code .


from langchain_community.llms import Ollama
llm = Ollama(
    base_url =  'http://your_machine_ip:11434',
    model="llava"
) 
tsantra commented 2 months ago

@shane-huang does it work only on GPU with Ollama? Also, one of the reasons I do not want to continue with Ollama is it seems unstable in performance. The official Ollama.

lzivan commented 2 months ago

Hi @tsantra , would you please append the image and the textfile you are using?