intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.25k stars 1.22k forks source link

Unable to get LanguageBind/Video-LLaVA-7B-hf model working through ipex-llm #11079

Open arisha07 opened 1 month ago

arisha07 commented 1 month ago

Hi there, I am able to download model from HF using VideoLlavaForConditionalGeneration.from_pretrained and optimize the model using ipex-llm.optimize_model(). But the process fails on generate() with the following - Any help on this will be great. Thank you!

    generate_ids = model.generate(**inputs, max_length=50,use_cache=True,)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\MTL_Gaming\miniconda3\envs\llava-ipex-llm-trans\Lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\MTL_Gaming\miniconda3\envs\llava-ipex-llm-trans\Lib\site-packages\transformers\generation\utils.py", line 1736, in generate
    result = self._sample(
             ^^^^^^^^^^^^^
  File "C:\Users\MTL_Gaming\miniconda3\envs\llava-ipex-llm-trans\Lib\site-packages\transformers\generation\utils.py", line 2375, in _sample
    outputs = self(
              ^^^^^
  File "C:\Users\MTL_Gaming\miniconda3\envs\llava-ipex-llm-trans\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\MTL_Gaming\miniconda3\envs\llava-ipex-llm-trans\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\MTL_Gaming\miniconda3\envs\llava-ipex-llm-trans\Lib\site-packages\transformers\models\video_llava\modeling_video_llava.py", line 581, in forward
    outputs = self.language_model(
              ^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\MTL_Gaming\miniconda3\envs\llava-ipex-llm-trans\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\MTL_Gaming\miniconda3\envs\llava-ipex-llm-trans\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\MTL_Gaming\miniconda3\envs\llava-ipex-llm-trans\Lib\site-packages\transformers\models\llama\modeling_llama.py", line 1164, in forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "C:\Users\MTL_Gaming\miniconda3\envs\llava-ipex-llm-trans\Lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\MTL_Gaming\miniconda3\envs\llava-ipex-llm-trans\Lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\MTL_Gaming\miniconda3\envs\llava-ipex-llm-trans\Lib\site-packages\ipex_llm\transformers\models\llama.py", line 155, in llama_model_forward_4_38
    return llama_model_forward_4_38_internal(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\MTL_Gaming\miniconda3\envs\llava-ipex-llm-trans\Lib\site-packages\ipex_llm\transformers\models\llama.py", line 1896, in llama_model_forward_4_38_internal
    causal_mask = self._update_causal_mask(attention_mask, inputs_embeds) #,cache_position,past_key_values,output_attentions)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: LlamaModel._update_causal_mask() missing 3 required positional arguments: 'cache_position', 'past_key_values', and 'output_attentions'
plischwe commented 1 month ago

Also attempting something similar. I have successfully saved the int4 videollava weights to a local directory. I attempt to load the model with 'model = VideoLlavaForConditionalGeneration.from_pretrained(saved_dir)' but I get an error saying 'ImportError: cannot import name 'VideoLlavaForConditionalGeneration' from ipex_llm.transformers. In the IPEX-LLM documentation it says that you can apply INT4 optimizations to any Hugging Face Transformers but there seems to be no support for this model. This model was integrated into HF transformers library very recently and this may be causing the issue, but it hits at a question that I have. How does model support actually get implemented within ipex_llm and why doesn't ipex_llm have access to all the models from HF transformers library like it claims it does? Thanks in advance. (p.s. I can make some minor changes to the model.py file in transformers directory to use VideoLlavaForConditionalGeneration and import this in the init.py file in same directory which allows me to load model, but then problems with generation like above.)

shane-huang commented 1 month ago

Could you please run env-check script and attach the logs for our diagnosis? Refer to https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/scripts/ for how to run it.

arisha07 commented 1 month ago

Please see the attached log from running env-check.bat. I had to update the transformers to use the latest version for using VideoLlavaProcessor, VideoLlavaForConditionalGeneration

Python 3.11.7
-----------------------------------------------------------------
transformers=4.41.0
-----------------------------------------------------------------
torch=2.1.0a0+cxx11.abi
-----------------------------------------------------------------
Name: ipex-llm
Version: 2.1.0b20240517
Summary: Large Language Model Develop Toolkit
Home-page: https://github.com/intel-analytics/BigDL
Author: BigDL Authors
Author-email: bigdl-user-group@googlegroups.com
License: Apache License, Version 2.0
Location: C:\Users\MTL_Gaming\miniconda3\envs\llava-ipex-llm-trans\Lib\site-packages
Requires:
Required-by:
-----------------------------------------------------------------
C:\Users\MTL_Gaming\miniconda3\envs\llava-ipex-llm-trans\Lib\site-packages\torchvision\io\image.py:13: UserWarning: Failed to load image Python extension: 'Could not find module 'C:\Users\MTL_Gaming\miniconda3\envs\llava-ipex-llm-trans\Lib\site-packages\torchvision\image.pyd' (or one of its dependencies). Try using the full path with constructor syntax.'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
  warn(
ipex=2.1.10+xpu
-----------------------------------------------------------------
Total Memory: 31.661 GB

Chip 0 Memory: 4 GB | Speed: 6400 MHz
Chip 1 Memory: 4 GB | Speed: 6400 MHz
Chip 2 Memory: 4 GB | Speed: 6400 MHz
Chip 3 Memory: 4 GB | Speed: 6400 MHz
Chip 4 Memory: 4 GB | Speed: 6400 MHz
Chip 5 Memory: 4 GB | Speed: 6400 MHz
Chip 6 Memory: 4 GB | Speed: 6400 MHz
Chip 7 Memory: 4 GB | Speed: 6400 MHz
-----------------------------------------------------------------
CPU Manufacturer: GenuineIntel
CPU MaxClockSpeed: 3800
CPU Name: Intel(R) Core(TM) Ultra 7 165H
CPU NumberOfCores: 16
CPU NumberOfLogicalProcessors: 22
-----------------------------------------------------------------
GPU 0: Intel(R) Arc(TM) Graphics         Driver Version:  31.0.101.5007
-----------------------------------------------------------------
-----------------------------------------------------------------
System Information

Host Name:                 MTL-MSI-PROD2
OS Name:                   Microsoft Windows 11 Home
OS Version:                10.0.22631 N/A Build 22631
OS Manufacturer:           Microsoft Corporation
OS Configuration:          Standalone Workstation
OS Build Type:             Multiprocessor Free
Registered Owner:          MTL_Gaming
Registered Organization:
Product ID:                00325-80000-00000-AAOEM
Original Install Date:     11/7/2023, 8:42:59 AM
System Boot Time:          5/17/2024, 11:39:34 AM
System Manufacturer:       Micro-Star International Co., Ltd.
System Model:              Please change product name
System Type:               x64-based PC
Processor(s):              1 Processor(s) Installed.
                           [01]: Intel64 Family 6 Model 170 Stepping 4 GenuineIntel ~1400 Mhz
BIOS Version:              American Megatrends International, LLC. E15A1IMS.AE5, 11/3/2023
Windows Directory:         C:\Windows
System Directory:          C:\Windows\system32
Boot Device:               \Device\HarddiskVolume1
System Locale:             en-us;English (United States)
Input Locale:              en-us;English (United States)
Time Zone:                 (UTC-08:00) Pacific Time (US & Canada)
Total Physical Memory:     32,421 MB
Available Physical Memory: 20,487 MB
Virtual Memory: Max Size:  48,293 MB
Virtual Memory: Available: 28,968 MB
Virtual Memory: In Use:    19,325 MB
Page File Location(s):     C:\pagefile.sys
Domain:                    WORKGROUP
Logon Server:              \\MTL-MSI-PROD2
Hotfix(s):                 4 Hotfix(s) Installed.
                           [01]: KB5037591
                           [02]: KB5027397
                           [03]: KB5037771
                           [04]: KB5037663
Network Card(s):           3 NIC(s) Installed.
                           [01]: Intel(R) Ethernet Connection (18) I219-V
                                 Connection Name: Ethernet
                                 Status:          Media disconnected
                           [02]: Bluetooth Device (Personal Area Network)
                                 Connection Name: Bluetooth Network Connection
                                 Status:          Media disconnected
                           [03]: Realtek USB GbE Family Controller
                                 Connection Name: Ethernet 3
                                 Status:          Media disconnected
Hyper-V Requirements:      A hypervisor has been detected. Features required for Hyper-V will not be displayed.
-----------------------------------------------------------------
+-----------+--------------------------------------------------------------------------------------+
| Device ID | Device Information
      |
+-----------+--------------------------------------------------------------------------------------+
| 0         | Device Name: Intel(R) Arc(TM) Graphics
      |
|           | Vendor Name: Intel(R) Corporation
      |
|           | UUID: 00000000-0000-0200-0000-00087d558086
      |
|           | PCI BDF Address: 0000:00:02.0
      |
+-----------+--------------------------------------------------------------------------------------+
ivy-lv11 commented 1 month ago

@plischwe Could you please share the code so that we could troubleshoot the root cause?

arisha07 commented 1 month ago

Here is my version of the code :

import av
import numpy as np
import torch

from transformers import VideoLlavaProcessor, VideoLlavaForConditionalGeneration
from ipex_llm import optimize_model

def read_video_pyav(container, indices):
    frames = []
    container.seek(0)
    start_index = indices[0]
    end_index = indices[-1]
    for i, frame in enumerate(container.decode(video=0)):
        if i > end_index:
            break
        if i >= start_index and i in indices:
            frames.append(frame)
    return np.stack([x.to_ndarray(format="rgb24") for x in frames])

model = VideoLlavaForConditionalGeneration.from_pretrained("LanguageBind/Video-LLaVA-7B-hf") 
processor = VideoLlavaProcessor.from_pretrained("LanguageBind/Video-LLaVA-7B-hf")
print("Model and Processor loaded")

model = optimize_model(model)
print("Optimize model")

model = model.to('xpu')
print("Model on GPU")

prompt = "USER: <video>What is in the video? ASSISTANT:"
video_path = "Video-LLaVA\\videollava\\serve\\examples\\sample_demo_3.mp4"
container = av.open(video_path)

# sample uniformly 8 frames from the video
total_frames = container.streams.video[0].frames
indices = np.arange(0, total_frames, total_frames / 8).astype(int)
clip = read_video_pyav(container, indices)

with torch.inference_mode():
    inputs = processor(text=prompt, videos=clip, return_tensors="pt").to("xpu")

    print("After getting inputs")

    # Generate
    generate_ids = model.generate(**inputs, max_length=50,use_cache=True)
    torch.xpu.synchronize()

    print("After generate")
    print(processor.batch_decode(generate_ids.cpu(), skip_special_tokens=True, clean_up_tokenization_spaces=False)[0])
plischwe commented 1 month ago

@plischwe Could you please share the code so that we could troubleshoot the root cause?

This is my code, I am using local int4 weights to load model:

from ipex_llm.optimize import low_memory_init, load_low_bit
import intel_extension_for_pytorch as ipex
from PIL import Image
import requests
import numpy as np
import torch
import av
from huggingface_hub import hf_hub_download
from ipex_llm.transformers import VideoLlavaForConditionalGeneration
from transformers import VideoLlavaProcessor

saved_dir = '/home/plischwe/int4_videollava'
def read_video_pyav(container, indices):
    frames = []
    container.seek(0)
    start_index = indices[0]
    end_index = indices[-1]
    for i, frame in enumerate(container.decode(video=0)):
        if i > end_index:
            break
        if i >= start_index and i in indices:
            frames.append(frame)
    return np.stack([x.to_ndarray(format="rgb24") for x in frames])

with low_memory_init(): # Fast and low cost by loading model on meta device
    model = VideoLlavaForConditionalGeneration.from_pretrained(saved_dir)
    model.to('xpu')
    print('loaded model on xpu')
    processor = VideoLlavaProcessor.from_pretrained("LanguageBind/Video-LLaVA-7B-hf")
    print('done with processor')

prompt = "USER: <video>Describe this video. ASSISTANT:"
video_path = "/home/plischwe/Video-LLaVA/videollava/serve/examples/sample_demo_1.mp4"
container = av.open(video_path)

total_frames = container.streams.video[0].frames
indices = np.arange(0, total_frames, total_frames / 8).astype(int)
clip = read_video_pyav(container, indices)

inputs = processor(text=prompt, videos=clip, return_tensors="pt")
inputs = inputs.to('xpu')
print('loaded inputs on xpu')

generate_ids = model.generate(**inputs, max_length=80)
print(processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0])

But getting the following error at the model.generate() call, seems that the model cannot be put on xpu, which in my case is an Arc770. I am guessing this could be a transformers issue and it's lack of support for Intel dGPU, but wanted to get an opinion on your end on how to navigate this issue.

/home/miniconda3/envs/llm/lib/python3.11/site-packages/transformers/generation/utils.py:1640: UserWarning: You are calling .generate() with the `input_ids` being on a device type different than your model's device. `input_ids` is on xpu, whereas the model is on meta. You may experience unexpected behaviors or slower generation. Please make sure that you have put `input_ids` to the correct device by calling for example input_ids = input_ids.to('meta') before running `.generate()`.
  warnings.warn(
Traceback (most recent call last):
  File "/home/plischwe/load_model.py", line 61, in <module>
    generate_ids = model.generate(**inputs, max_length=80)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/llm/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/llm/lib/python3.11/site-packages/transformers/generation/utils.py", line 1739, in generate
    result = self._sample(
             ^^^^^^^^^^^^^
  File "/home/miniconda3/envs/llm/lib/python3.11/site-packages/transformers/generation/utils.py", line 2378, in _sample
    outputs = self(
              ^^^^^
  File "/home/miniconda3/envs/llm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/llm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/llm/lib/python3.11/site-packages/transformers/models/video_llava/modeling_video_llava.py", line 514, in forward
    image_outputs, video_outputs = self._get_vision_features(
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/llm/lib/python3.11/site-packages/transformers/models/video_llava/modeling_video_llava.py", line 381, in _get_vision_features
    video_outputs = self.video_tower(pixel_values, output_hidden_states=True)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/llm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/llm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/llm/lib/python3.11/site-packages/transformers/models/clip/modeling_clip.py", line 926, in forward
    return self.vision_model(
           ^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/llm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/llm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/llm/lib/python3.11/site-packages/transformers/models/clip/modeling_clip.py", line 850, in forward
    hidden_states = self.embeddings(pixel_values)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/llm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/llm/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/llm/lib/python3.11/site-packages/transformers/models/clip/modeling_clip.py", line 189, in forward
    embeddings = torch.cat([class_embeds, patch_embeds], dim=1)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/llm/lib/python3.11/site-packages/torch/_prims_common/wrappers.py", line 229, in _fn
    result = fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/llm/lib/python3.11/site-packages/torch/_prims_common/wrappers.py", line 132, in _fn
    result = fn(**bound.arguments)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/home/miniconda3/envs/llm/lib/python3.11/site-packages/torch/_refs/__init__.py", line 2680, in cat
    utils.check_same_device(*tensors, allow_cpu_scalar_tensors=False)
  File "/home/miniconda3/envs/llm/lib/python3.11/site-packages/torch/_prims_common/__init__.py", line 654, in check_same_device
    raise RuntimeError(msg)
RuntimeError: Tensor on device xpu:0 is not on the expected device meta!

Here is also my env-check:

-----------------------------------------------------------------
PYTHON_VERSION=3.11.9
-----------------------------------------------------------------
transformers=4.41.0
-----------------------------------------------------------------
torch=2.1.0.post2+cxx11.abi
-----------------------------------------------------------------
ipex-llm Version: 2.1.0b20240517
-----------------------------------------------------------------
ipex=2.1.30+xpu
-----------------------------------------------------------------
CPU Information: 
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Address sizes:                      46 bits physical, 48 bits virtual
Byte Order:                         Little Endian
CPU(s):                             28
On-line CPU(s) list:                0-27
Vendor ID:                          GenuineIntel
Model name:                         Intel(R) Core(TM) i9-10940X CPU @ 3.30GHz
CPU family:                         6
Model:                              85
Thread(s) per core:                 2
Core(s) per socket:                 14
Socket(s):                          1
Stepping:                           7
CPU max MHz:                        4800.0000
CPU min MHz:                        1200.0000
BogoMIPS:                           6599.98
-----------------------------------------------------------------
Total CPU Memory: 31.0212 GB
Memory Type: DDR4 
-----------------------------------------------------------------
Operating System: 
Ubuntu 22.04.3 LTS \n \l

-----------------------------------------------------------------
Linux machine 6.5.0-28-generic #29~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Thu Apr  4 14:39:20 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
-----------------------------------------------------------------
CLI:
    Version: 1.2.31.20240308
    Build ID: 00000000

Service:
    Version: 1.2.31.20240308
    Build ID: 00000000
    Level Zero Version: 
-----------------------------------------------------------------
  Driver Version                                  2024.17.3.0.08_160000
  Driver Version                                  2024.17.3.0.08_160000
  Driver UUID                                     32332e35-322e-3238-3230-322e35320000
  Driver Version                                  23.52.28202.52
  Driver UUID                                     32332e35-322e-3238-3230-322e35320000
  Driver Version                                  23.52.28202.52
  Driver UUID                                     32332e35-322e-3238-3230-322e35320000
  Driver Version                                  23.52.28202.52
-----------------------------------------------------------------
Driver related package version:
ii  intel-level-zero-gpu                           1.3.28202.52-821~22.04                  amd64        Intel(R) Graphics Compute Runtime for oneAPI Level Zero.
ii  level-zero-dev                                 1.16.15-821~22.04                       amd64        Intel(R) Graphics Compute Runtime for oneAPI Level Zero.
-----------------------------------------------------------------
igpu not detected
-----------------------------------------------------------------
xpu-smi is properly installed. 
-----------------------------------------------------------------
+-----------+--------------------------------------------------------------------------------------+
| Device ID | Device Information                                                                   |
+-----------+--------------------------------------------------------------------------------------+
| 0         | Device Name: Intel(R) Arc(TM) A770 Graphics                                          |
|           | Vendor Name: Intel(R) Corporation                                                    |
|           | SOC UUID: 00000000-0000-001b-0000-000856a08086                                       |
|           | PCI BDF Address: 0000:1b:00.0                                                        |
|           | DRM Device: /dev/dri/card0                                                           |
|           | Function Type: physical                                                              |
+-----------+--------------------------------------------------------------------------------------+
| 1         | Device Name: Intel(R) Arc(TM) A770 Graphics                                          |
|           | Vendor Name: Intel(R) Corporation                                                    |
|           | SOC UUID: 00000000-0000-001f-0000-000856a08086                                       |
|           | PCI BDF Address: 0000:1f:00.0                                                        |
|           | DRM Device: /dev/dri/card1                                                           |
|           | Function Type: physical                                                              |
+-----------+--------------------------------------------------------------------------------------+
| 2         | Device Name: Intel(R) Arc(TM) A770 Graphics                                          |
|           | Vendor Name: Intel(R) Corporation                                                    |
|           | SOC UUID: 00000000-0000-006a-0000-000856a08086                                       |
|           | PCI BDF Address: 0000:6a:00.0                                                        |
|           | DRM Device: /dev/dri/card2                                                           |
|           | Function Type: physical                                                              |
+-----------+--------------------------------------------------------------------------------------+
GPU0 Memory size=16G
GPU1 Memory size=16G
GPU2 Memory size=16G
-----------------------------------------------------------------
1b:00.0 VGA compatible controller: Intel Corporation Device 56a0 (rev 08) (prog-if 00 [VGA controller])
    Subsystem: Device 172f:3937
    Flags: bus master, fast devsel, latency 0, IRQ 91, NUMA node 0
    Memory at b4000000 (64-bit, non-prefetchable) [size=16M]
    Memory at 13800000000 (64-bit, prefetchable) [size=16G]
    Expansion ROM at b5000000 [disabled] [size=2M]
    Capabilities: <access denied>
    Kernel driver in use: i915
    Kernel modules: i915
--
1f:00.0 VGA compatible controller: Intel Corporation Device 56a0 (rev 08) (prog-if 00 [VGA controller])
    Subsystem: Device 172f:3937
    Flags: bus master, fast devsel, latency 0, IRQ 94, NUMA node 0
    Memory at b2000000 (64-bit, non-prefetchable) [size=16M]
    Memory at 13000000000 (64-bit, prefetchable) [size=16G]
    Expansion ROM at b3000000 [disabled] [size=2M]
    Capabilities: <access denied>
    Kernel driver in use: i915
    Kernel modules: i915
--
6a:00.0 VGA compatible controller: Intel Corporation Device 56a0 (rev 08) (prog-if 00 [VGA controller])
    Subsystem: Device 172f:3937
    Flags: bus master, fast devsel, latency 0, IRQ 97, NUMA node 0
    Memory at d7000000 (64-bit, non-prefetchable) [size=16M]
    Memory at 17800000000 (64-bit, prefetchable) [size=16G]
    Expansion ROM at d8000000 [disabled] [size=2M]
    Capabilities: <access denied>
    Kernel driver in use: i915
    Kernel modules: i915
shane-huang commented 1 month ago

Here is my version of the code :

import av
import numpy as np
import torch

from transformers import VideoLlavaProcessor, VideoLlavaForConditionalGeneration
from ipex_llm import optimize_model

def read_video_pyav(container, indices):
    frames = []
    container.seek(0)
    start_index = indices[0]
    end_index = indices[-1]
    for i, frame in enumerate(container.decode(video=0)):
        if i > end_index:
            break
        if i >= start_index and i in indices:
            frames.append(frame)
    return np.stack([x.to_ndarray(format="rgb24") for x in frames])

model = VideoLlavaForConditionalGeneration.from_pretrained("LanguageBind/Video-LLaVA-7B-hf") 
processor = VideoLlavaProcessor.from_pretrained("LanguageBind/Video-LLaVA-7B-hf")
print("Model and Processor loaded")

model = optimize_model(model)
print("Optimize model")

model = model.to('xpu')
print("Model on GPU")

prompt = "USER: <video>What is in the video? ASSISTANT:"
video_path = "Video-LLaVA\\videollava\\serve\\examples\\sample_demo_3.mp4"
container = av.open(video_path)

# sample uniformly 8 frames from the video
total_frames = container.streams.video[0].frames
indices = np.arange(0, total_frames, total_frames / 8).astype(int)
clip = read_video_pyav(container, indices)

with torch.inference_mode():
    inputs = processor(text=prompt, videos=clip, return_tensors="pt").to("xpu")

    print("After getting inputs")

    # Generate
    generate_ids = model.generate(**inputs, max_length=50,use_cache=True)
    torch.xpu.synchronize()

    print("After generate")
    print(processor.batch_decode(generate_ids.cpu(), skip_special_tokens=True, clean_up_tokenization_spaces=False)[0])

@arisha07 This program uses ipex-llm correctly. It failes because "ipex-llm does not support transformer versions above 4.38, but VideoLlavaForConditionalGeneration requires a newer transformers version.

shane-huang commented 1 month ago

ipex_llm.transformers import VideoLlavaForConditionalGeneration

@plischwe

ipex-llm provides a general API optimize_model (see documentation) that can be used with arbitrary PyTorch model. We also offer specific APIs for transformer auto classes (e.g., ipex-llm.transformers.AutoModel, ipex-llm.transformers.AutoModelForCausalLM) for convenience. However, such convenient APIs are only available for auto classes, so you need to use the optimize_model API for non-auto classes like VideoLlavaForConditionalGeneration.

transformers frequently update with new models and interface changes, so new versions or models may not be immediately supported. You can raise a feature request for ipex-llm.

arisha07 commented 1 month ago

Here is my version of the code :

import av
import numpy as np
import torch

from transformers import VideoLlavaProcessor, VideoLlavaForConditionalGeneration
from ipex_llm import optimize_model

def read_video_pyav(container, indices):
    frames = []
    container.seek(0)
    start_index = indices[0]
    end_index = indices[-1]
    for i, frame in enumerate(container.decode(video=0)):
        if i > end_index:
            break
        if i >= start_index and i in indices:
            frames.append(frame)
    return np.stack([x.to_ndarray(format="rgb24") for x in frames])

model = VideoLlavaForConditionalGeneration.from_pretrained("LanguageBind/Video-LLaVA-7B-hf") 
processor = VideoLlavaProcessor.from_pretrained("LanguageBind/Video-LLaVA-7B-hf")
print("Model and Processor loaded")

model = optimize_model(model)
print("Optimize model")

model = model.to('xpu')
print("Model on GPU")

prompt = "USER: <video>What is in the video? ASSISTANT:"
video_path = "Video-LLaVA\\videollava\\serve\\examples\\sample_demo_3.mp4"
container = av.open(video_path)

# sample uniformly 8 frames from the video
total_frames = container.streams.video[0].frames
indices = np.arange(0, total_frames, total_frames / 8).astype(int)
clip = read_video_pyav(container, indices)

with torch.inference_mode():
    inputs = processor(text=prompt, videos=clip, return_tensors="pt").to("xpu")

    print("After getting inputs")

    # Generate
    generate_ids = model.generate(**inputs, max_length=50,use_cache=True)
    torch.xpu.synchronize()

    print("After generate")
    print(processor.batch_decode(generate_ids.cpu(), skip_special_tokens=True, clean_up_tokenization_spaces=False)[0])

@arisha07 This program uses ipex-llm correctly. It failes because "ipex-llm does not support transformer versions above 4.38, but VideoLlavaForConditionalGeneration requires a newer transformers version.

Do you know when ipex-llm will be able to support 4.38 transformer version?

plischwe commented 1 month ago

I see, thanks for the clarification. So is it then possible to load a saved Video-LLaVA int4 model on an Arc770 without converting the weights in memory using optimize_model? As you see in my code I have the int4 weights saved, but unable to load without converting weights in memory, which limits the hardware we can use because of memory constraints. The code here doesn't work with VideoLlavaForConditionalGeneration when loading model from local path.

shane-huang commented 1 month ago

I see, thanks for the clarification. So is it then possible to load a saved Video-LLaVA int4 model on an Arc770 without converting the weights in memory using optimize_model? As you see in my code I have the int4 weights saved, but unable to load without converting weights in memory, which limits the hardware we can use because of memory constraints. The code here doesn't work with VideoLlavaForConditionalGeneration when loading model from local path.

optimize_model usually can work with non-auto transformer model classes, but as transformers library is updating frequently, it is not guaranteed that all new models will be supported immediately. Video-llava requires transformers 4.41 and is not supported yet.

pkhara31 commented 1 week ago

It worked for me by downgrading transformers version to 4.37.2.

pip install transformers==4.37.2

I do not face this error "TypeError: LlamaModel._update_causal_mask() missing 3 required positional arguments: 'cache_position', 'past_key_values', and 'output_attentions'" with transformers 4.37.2

arisha07 commented 1 week ago

Is VideoLlavaProcessor available in transformer 4.37.2 ??