intel / intel-extension-for-pytorch

A Python package for extending the official PyTorch that can easily obtain performance on Intel platform
Apache License 2.0
1.53k stars 236 forks source link

Unable to run LanguageBind/Video-LLaVA-7B-hf using ipex #670

Closed shailesh837 closed 3 weeks ago

shailesh837 commented 1 month ago

Describe the bug

I am getting issue running below code using ipex-llm:

(llm_vision) spandey2@IMU-NEX-ADLP-voice-SUT:~/LLM_Computer_Vision$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu

(llm_vision) spandey2@IMU-NEX-ADLP-voice-SUT:~/LLM_Computer_Vision$ uname -a
Linux IMU-NEX-ADLP-voice-SUT 6.5.0-41-generic #41~22.04.2-Ubuntu

source /opt/intel/oneapi/compiler/latest/env/vars.sh ;source /opt/intel/oneapi/mkl/latest/env/vars.sh ;source /opt/intel/oneapi/ccl/latest/env/vars.sh ;source /opt/intel/oneapi/mpi/latest/env/vars.sh

(llm_vision) spandey2@IMU-NEX-ADLP-voice-SUT:~/LLM_Computer_Vision$ pip list | grep -i torch
intel-extension-for-pytorch 2.1.30+xpu
torch                       2.1.0.post2+cxx11.abi
torchaudio                  2.1.0.post2+cxx11.abi
torchvision                 0.16.0.post2+cxx11.abi

(llm_vision) spandey2@imu-nex-nuc13x2-arc770-dut:~/LLM_Computer_Vision$ python test_llama_video_code.py Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00,  2.76it/s] 

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
/home/spandey2/miniconda3/envs/llm_vision/lib/python3.11/site-packages/transformers/feature_extraction_utils.py:
141: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. 
Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor.
 (Triggered internally at /build/pytorch/torch/csrc/utils/tensor_new.cpp:261.) return torch.tensor(value) 
Caught a runtime error during generation: 
Native API failed. Native API returns: -6 (PI_ERROR_OUT_OF_HOST_MEMORY) -6 (PI_ERROR_OUT_OF_HOST_MEMORY) 

Code:

import av
import numpy as np
import torch
from transformers import VideoLlavaProcessor, VideoLlavaForConditionalGeneration, BitsAndBytesConfig
import intel_extension_for_pytorch as ipex

def read_video_pyav(container, indices):
    frames = []
    container.seek(0)
    start_index = indices[0]
    end_index = indices[-1]
    for i, frame in enumerate(container.decode(video=0)):
        if i > end_index:
            break
        if i >= start_index and i in indices:
            frames.append(frame)
    return np.stack([x.to_ndarray(format="rgb24") for x in frames])

# Initialize model and processor
model = VideoLlavaForConditionalGeneration.from_pretrained("LanguageBind/Video-LLaVA-7B-hf", torch_dtype=torch.float16)
processor = VideoLlavaProcessor.from_pretrained("LanguageBind/Video-LLaVA-7B-hf")

# Move model to XPU
model = model.to('xpu')

prompt = "USER: <video>Why is this video funny? ASSISTANT:"
video_path = "/home/xxxx/LLM_Computer_Vision/xxx_Demo_Video.mp4"
container = av.open(video_path)

# Sample uniformly 8 frames from the video
total_frames = container.streams.video[0].frames
indices = np.arange(0, total_frames, total_frames / 8).astype(int)
clip = read_video_pyav(container, indices)

inputs = processor(text=prompt, videos=clip, return_tensors="pt")

# Move inputs to the XPU
inputs = {k: v.to('xpu') for k, v in inputs.items()}

# Generate response with memory considerations
try:
    with torch.no_grad():
        generate_ids = model.generate(**inputs, max_length=80)
    result = processor.batch_decode(generate_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
    print(result)
except RuntimeError as e:
    print("Caught a runtime error during generation:", e)
except Exception as e:

please can you help me with the issue i am getting running on Intel Arc GPU 770

Versions

Collecting environment information... PyTorch version: 2.1.0.post2+cxx11.abi PyTorch CXX11 ABI: Yes IPEX version: N/A IPEX commit: N/A Build type: N/A

OS: Ubuntu 22.04.4 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 Clang version: N/A IGC version: N/A CMake version: N/A Libc version: glibc-2.35

Python version: 3.11.0rc1 (main, Aug 12 2022, 10:02:14) [GCC 11.2.0] (64-bit runtime) Python platform: Linux-6.5.0-41-generic-x86_64-with-glibc2.35 Is XPU available: N/A DPCPP runtime version: N/A MKL version: N/A GPU models and configuration: N/A Intel OpenCL ICD version: 24.13.29138.29-881~22.04 Level Zero version: 1.3.29138.29-881~22.04

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 20 On-line CPU(s) list: 0-19 Vendor ID: GenuineIntel Model name: 12th Gen Intel(R) Core(TM) i7-12800HE CPU family: 6 Model: 154 Thread(s) per core: 2 Core(s) per socket: 14 Socket(s): 1 Stepping: 3 CPU max MHz: 4600.0000 CPU min MHz: 400.0000 BogoMIPS: 5222.40 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves split_lock_detect avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req hfi vnmi umip pku ospke waitpkg gfni vaes vpclmulqdq tme rdpid movdiri movdir64b fsrm md_clear serialize pconfig arch_lbr ibt flush_l1d arch_capabilities Virtualization: VT-x L1d cache: 544 KiB (14 instances) L1i cache: 704 KiB (14 instances) L2 cache: 11.5 MiB (8 instances) L3 cache: 24 MiB (1 instance) NUMA node(s): 1 NUMA node0 CPU(s): 0-19 Vulnerability Gather data sampling: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; RSB filling; PBRSB-eIBRS SW sequence; BHI BHI_DIS_S Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected

Versions of relevant libraries: [pip3] intel-extension-for-pytorch==2.1.30+xpu [pip3] numpy==1.26.4 [pip3] torch==2.1.0.post2+cxx11.abi [pip3] torchaudio==2.1.0.post2+cxx11.abi [pip3] torchvision==0.16.0.post2+cxx11.abi [conda] N/A

feng-intel commented 1 month ago

Thanks for this report. I can reproduce it in my ARC770 platform.

shailesh837 commented 1 month ago

image

feng-intel commented 1 month ago

It's OOM from device but not host. It looks ARC770 16G memory is not enough for this model "LanguageBind/Video-LLaVA-7B-hf" / float16