intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
Apache License 2.0
6.76k stars 1.27k forks source link

microsoft/Florence-2-large Unable to perform Object Detection on Intel Arc GPU 770 #11402

Open shailesh837 opened 5 months ago

shailesh837 commented 5 months ago

microsoft/Florence-2-large Model to run on Arc 770 GPU for Object Detection on sample image. https://huggingface.co/microsoft/Florence-2-large

(llm_vision) spandey2@imu-nex-nuc13x2-arc770-dut:~/LLM_Computer_Vision$ pip list | grep torch
intel-extension-for-pytorch 2.1.30+xpu
torch                       2.1.0.post2+cxx11.abi
torchaudio                  2.1.0.post2+cxx11.abi
torchvision                 0.16.0.post2+cxx11.abi

(llm_vision) spandey2@imu-nex-nuc13x2-arc770-dut:~/LLM_Computer_Vision$ python3 computer_vision_short_code.py
2024-06-24 01:34:06,980 - INFO - intel_extension_for_pytorch auto imported
Starting script...
/home/spandey2/miniconda3/envs/llm_vision/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
The repository for microsoft/Florence-2-large contains custom code which must be executed to correctly load the model. You can inspect the repository content at https://hf.co/microsoft/Florence-2-large.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.
Do you wish to run the custom code? [y/N] y
Model and processor loaded.
Image loaded.
Starting tasks...
Running example with task_prompt: <OD>
Inputs: {'input_ids': tensor([[    0,   574, 22486,     5,  8720,    19,  4120,   766,    11,     5,
          2274,     4,     2]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]), 'pixel_values': tensor([[[[-1.1418, -0.7479, -0.9192,  ..., -1.9295, -1.9467, -1.9638],
          [-0.7650, -0.7137, -0.9192,  ..., -1.9295, -1.9467, -1.9638],
          [-0.2684, -0.7308, -0.9705,  ..., -1.9467, -1.9638, -1.9809],
          ...,
          [ 0.5707,  0.5707,  0.5707,  ...,  0.3481,  0.3481,  0.3652],
          [ 0.5707,  0.5707,  0.5707,  ...,  0.3823,  0.3823,  0.3652],
          [ 0.5707,  0.5707,  0.5707,  ...,  0.4166,  0.3994,  0.3823]],

         [[-1.3880, -0.9853, -1.1604,  ..., -1.6506, -1.6681, -1.6856],
          [-1.0028, -0.9503, -1.1604,  ..., -1.6506, -1.6681, -1.6856],
          [-0.4951, -0.9678, -1.2129,  ..., -1.6681, -1.6856, -1.7031],
          ...,
          [ 0.6078,  0.6078,  0.6078,  ...,  0.3978,  0.3978,  0.4153],
          [ 0.6078,  0.6078,  0.6078,  ...,  0.4328,  0.4328,  0.4153],
          [ 0.6078,  0.6078,  0.6078,  ...,  0.4678,  0.4503,  0.4328]],

         [[-1.1770, -0.7761, -0.9504,  ..., -1.4907, -1.5081, -1.5256],
          [-0.7936, -0.7413, -0.9504,  ..., -1.4907, -1.5081, -1.5256],
          [-0.2881, -0.7587, -1.0027,  ..., -1.5081, -1.5256, -1.5430],
          ...,
          [ 0.8274,  0.8274,  0.8274,  ...,  0.5659,  0.5659,  0.5834],
          [ 0.8274,  0.8274,  0.8274,  ...,  0.6008,  0.6008,  0.5834],
          [ 0.8274,  0.8274,  0.8274,  ...,  0.6356,  0.6182,  0.6008]]]])}
Generated IDs: tensor([[2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
         0, 0, 0, 0, 0, 0, 0, 0, 0]])
Generated Text: </s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s>
Parsed Answer: {'<OD>': {'bboxes': [], 'labels': []}}
{'<OD>': {'bboxes': [], 'labels': []}}
No bounding boxes detected.

Please can you check whats wrong while converting the model to xpu , so simple object detection should work ? CODE:

import os
import requests
import torch
from PIL import Image, ImageDraw
import copy
import numpy as np
import random
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from ipex_llm.transformers import AutoModelForCausalLM
from transformers import AutoProcessor, GenerationConfig

print("Starting script...")

# Define the model and processor
model_id = 'microsoft/Florence-2-large'
model = AutoModelForCausalLM.from_pretrained(model_id, trust_remote_code=True).eval()
model = model.to('xpu')
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)

print("Model and processor loaded.")

# Function to run the model on a given task prompt and optional text input
def run_example(task_prompt, text_input=None):
    print(f"Running example with task_prompt: {task_prompt}")
    if text_input is None:
        prompt = task_prompt
    else:
        prompt = task_prompt + text_input
    inputs = processor(text=prompt, images=image, return_tensors="pt")
    print(f"Inputs: {inputs}")
    inputs = {k: v.to('xpu') for k, v in inputs.items()}
    generated_ids = model.generate(
        input_ids=inputs["input_ids"],
        pixel_values=inputs["pixel_values"],
        max_new_tokens=512,
        early_stopping=True,
        do_sample=False,
        num_beams=3,
        generation_config=GenerationConfig(use_cache=True)
    ).cpu()
    print(f"Generated IDs: {generated_ids}")
    generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]
    print(f"Generated Text: {generated_text}")
    parsed_answer = processor.post_process_generation(
        generated_text,
        task=task_prompt,
        image_size=(image.width, image.height)
    )
    print(f"Parsed Answer: {parsed_answer}")
    return parsed_answer

# Load the image
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true"
image = Image.open(requests.get(url, stream=True).raw)
print("Image loaded.")

# Function to plot bounding boxes
def plot_bbox(image, data):
    fig, ax = plt.subplots()
    ax.imshow(image)
    for bbox, label in zip(data['bboxes'], data['labels']):
        x1, y1, x2, y2 = bbox
        rect = patches.Rectangle((x1, y1), x2-x1, y2-y1, linewidth=1, edgecolor='r', facecolor='none')
        ax.add_patch(rect)
        plt.text(x1, y1, label, color='white', fontsize=8, bbox=dict(facecolor='red', alpha=0.5))
    ax.axis('off')
    plt.show()

print("Starting tasks...")

# Define tasks and run the model
task_prompt = '<OD>'
results = run_example(task_prompt)
print(results)
if results['<OD>']['bboxes']:
    plot_bbox(image, results['<OD>'])
else:
    print("No bounding boxes detected.")
sgwhat commented 5 months ago

Hi @shailesh837 , we will inform you when we make progress.

shailesh837 commented 5 months ago

@sgwhat : Do we know whats the issue is ? is it model not getting loaded in GPU ?

sgwhat commented 5 months ago

We do not support https://huggingface.co/microsoft/Florence-2-large currently.

shailesh837 commented 5 months ago

@sgwhat : Please tell me what does that mean? you can load the model on XPU or inference is issue or what, so we know what is reason.