intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.31k stars 1.23k forks source link

Unable to run LanguageBind/Video-LLaVA-7B-hf using ipex-llm #11509

Open shailesh837 opened 2 weeks ago

shailesh837 commented 2 weeks ago

@songhappy / @shane-huang : Please could you share the code or steps how you ran LanguageBind/Video-LLaVA-7B-hf on IPEX-LLM few months back.

As we have a customer who wants to use video-llava running on Intel Arc GPU along with other computer vision model for providing what passenger is sitting,standing and other surrounding challeneges using LLM inside a autonomus Bus ride.

I am gettign issue running below code using ipex-llm:

(llm_vision) spandey2@imu-nex-nuc13x2-arc770-dut:~/LLM_Computer_Vision$ python test_llama_video_code.py Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00, 2.76it/s] Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. /home/spandey2/miniconda3/envs/llm_vision/lib/python3.11/site-packages/transformers/feature_extraction_utils.py:141: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at /build/pytorch/torch/csrc/utils/tensor_new.cpp:261.) return torch.tensor(value) Caught a runtime error during generation: Native API failed. Native API returns: -6 (PI_ERROR_OUT_OF_HOST_MEMORY) -6 (PI_ERROR_OUT_OF_HOST_MEMORY) (llm_vision) spandey2@imu-nex-nuc13x2-arc770-dut:~/LLM_Computer_Vision$

Code:

import av
import numpy as np
import torch
from transformers import VideoLlavaProcessor, VideoLlavaForConditionalGeneration, BitsAndBytesConfig
import intel_extension_for_pytorch as ipex

def read_video_pyav(container, indices):
    frames = []
    container.seek(0)
    start_index = indices[0]
    end_index = indices[-1]
    for i, frame in enumerate(container.decode(video=0)):
        if i > end_index:
            break
        if i >= start_index and i in indices:
            frames.append(frame)
    return np.stack([x.to_ndarray(format="rgb24") for x in frames])

# Initialize model and processor
model = VideoLlavaForConditionalGeneration.from_pretrained("LanguageBind/Video-LLaVA-7B-hf", torch_dtype=torch.float16)
processor = VideoLlavaProcessor.from_pretrained("LanguageBind/Video-LLaVA-7B-hf")

# Move model to XPU
model = model.to('xpu')

prompt = "USER: <video>Why is this video funny? ASSISTANT:"
video_path = "/home/xxxx/LLM_Computer_Vision/xxx_Demo_Video.mp4"
container = av.open(video_path)

# Sample uniformly 8 frames from the video
total_frames = container.streams.video[0].frames
indices = np.arange(0, total_frames, total_frames / 8).astype(int)
clip = read_video_pyav(container, indices)

inputs = processor(text=prompt, videos=clip, return_tensors="pt")

# Move inputs to the XPU
inputs = {k: v.to('xpu') for k, v in inputs.items()}

# Generate response with memory considerations
try:
    with torch.no_grad():
        generate_ids = model.generate(**inputs, max_length=80)
    result = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
    print(result)
except RuntimeError as e:
    print("Caught a runtime error during generation:", e)
except Exception as e:

As you have managed to run the videollava, few months back, please can you help me with the issue i am getting running on Intel Arc GPU 770

shailesh837 commented 1 week ago

@shane-huang ; I have made progress on the running on Arc GPU, but it works on image only, not video, video i am getting OOM issue.

Please use this patch and steps and it works for image , not video on Arc GPU ?

https://github.com/intel-sandbox/llm-arc-flex/tree/main/video-llava

Image Process it works:

(llm_vision) spandey2@imu-nex-nuc13x2-arc770-dut:~/LLM_Computer_Vision/Video-LLaVA$ python -m videollava.serve.cli --model-path "LanguageBind/Video-LLaVA-7B" --file "/home/spandey2/LLM_Computer_Vision/BBT_Demo_both_balls.jpeg" --device xpu
Human: please don't hallucinate , tell me whats happening in the image ?
Assistant: In the image, there is a blue ball sitting on top of a white frisbee, which is placed on a table. The frisbee is positioned in the center of the table, and the ball is located slightly to the right of it. The scene appears to be a casual setting where someone might be playing with the frisbee and the ball.
Human: its not frisbbe , its balancing ball table
Assistant: Apologies for the confusion. The image shows a balancing ball table, with a blue ball placed on top of it. The table is positioned on a wooden surface, and the ball is balancing on the table's edge. This scene could be part of a game or a decorative piece, showcasing the ball's ability to balance on the table.
Human: please don't hallucinate , tell me whats happening in the image ?
Assistant: In the image, there is a blue ball sitting on top of a white frisbee, which is placed on a table. The frisbee is positioned in the center of the table, and the ball is located slightly to the right of it. The scene appears to be a casual setting where someone might be playing with the frisbee and the ball.
Human: its not frisbbe , its balancing ball table
Assistant: Apologies for the confusion. The image shows a balancing ball table, with a blue ball placed on top of it. The table is positioned on a wooden surface, and the ball is balancing on the table's edge. This scene could be part of a game or a decorative piece, showcasing the ball's ability to balance on the table.

Video error:

python -m videollava.serve.cli --model-path "LanguageBind/Video-LLaVA-7B" --file "/home/spandey2/LLM_Computer_Vision/sample_8.mp4" --device xpu
Human: whats inside the video ?
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/spandey2/LLM_Computer_Vision/Video-LLaVA/videollava/serve/cli.py", line 142, in <module>
    main(args)
  File "/home/spandey2/LLM_Computer_Vision/Video-LLaVA/videollava/serve/cli.py", line 111, in main
    output_ids = model.generate(
                 ^^^^^^^^^^^^^^^
  File "/home/spandey2/miniconda3/envs/llm_vision/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/spandey2/miniconda3/envs/llm_vision/lib/python3.11/site-packages/transformers/generation/utils.py", line 1648, in generate
    return self.sample(
           ^^^^^^^^^^^^
  File "/home/spandey2/miniconda3/envs/llm_vision/lib/python3.11/site-packages/transformers/generation/utils.py", line 2730, in sample
    outputs = self(
              ^^^^^
  File "/home/spandey2/miniconda3/envs/llm_vision/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/spandey2/miniconda3/envs/llm_vision/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/spandey2/LLM_Computer_Vision/Video-LLaVA/videollava/model/language_model/llava_llama.py", line 88, in forward
    return super().forward(
           ^^^^^^^^^^^^^^^^
  File "/home/spandey2/miniconda3/envs/llm_vision/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 820, in forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "/home/spandey2/miniconda3/envs/llm_vision/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/spandey2/miniconda3/envs/llm_vision/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/spandey2/miniconda3/envs/llm_vision/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 708, in forward
    layer_outputs = decoder_layer(
                    ^^^^^^^^^^^^^^
  File "/home/spandey2/miniconda3/envs/llm_vision/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/spandey2/miniconda3/envs/llm_vision/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/spandey2/miniconda3/envs/llm_vision/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 436, in forward
    hidden_states = self.post_attention_layernorm(hidden_states)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/spandey2/miniconda3/envs/llm_vision/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/spandey2/miniconda3/envs/llm_vision/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/spandey2/miniconda3/envs/llm_vision/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py", line 87, in forward
    variance = hidden_states.pow(2).mean(-1, keepdim=True)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Native API failed. Native API returns: -6 (PI_ERROR_OUT_OF_HOST_MEMORY) -6 (PI_ERROR_OUT_OF_HOST_MEMORY)