openvinotoolkit / openvino_notebooks

📚 Jupyter notebook tutorials for OpenVINO™
Apache License 2.0
2.36k stars 800 forks source link

123-detectron2-to-openvino.ipynb gave error when device.value='GPU' on Iris Xe graphic(core i7 1165G7) #1552

Closed YOODS-Xu closed 9 months ago

YOODS-Xu commented 10 months ago

Describe the bug When set device.value='GPU', 123-detectron2-to-openvino.ipyn gave error as below, but can inference well in cpu mode.

Kernel Restarting The kernel for notebooks/123-detectron2-to-openvino/123-detectron2-to-openvino.ipynb appears to have died. It will restart automatically.

Expected behavior Bye the way, I transformed my detectron2 instance segment model to onnx then xml. It can inference in igpu mode(device.value="GPU"), but speed is very very slow, 10x slower than in cpu mode.

Does openvino support detectron2 in igpu mode? If yes, please teach me from which igpu or cpu process works.

cpu: corei7 1165G7 2.8GHz gpu: Iris Xe Graphics os: windows10 pro 22H2 19045.3803 openvino version: 2023.2.0 display driver version: 31.0.101.4953

Screenshots

Installation instructions (Please mark the checkbox) [ ok] I followed the installation guide at https://github.com/openvinotoolkit/openvino_notebooks#-installation-guide to install the notebooks.

Environment information Please run python check_install.py in the _openvinonotebooks directory. If the output is NOT OK for any of the checks, please follow the instructions to fix that. If that does not work, or if you still encounter the issue, please paste the output of check_install.py here.

Additional context Add any other context about the problem here.

YOODS-Xu commented 10 months ago

Thank you very much for your prompt response. Below is the task manager and process times when I inferenced my detectron2 instance segment model in gpu mode and cpu mode.

GPU mode: image

=== avaliable device=['CPU', 'GPU'] device=GPU 0=8.986126899719238 1=7.301579475402832 2=6.897884130477905 3=6.892237424850464 4=8.159549951553345 5=5.733447074890137 6=4.729335308074951 7=6.311661005020142 8=3.581742763519287 9=3.4575953483581543 iteration=10 sum time=62.051159381866455 mean time=6.205115938186646 min time=3.4575953483581543 max time=8.986126899719238

CPU mode: image

=== avaliable device=['CPU', 'GPU'] device=CPU 0=0.46947169303894043 1=0.40115976333618164 2=0.40078234672546387 3=0.416104793548584 4=0.38555479049682617 5=0.4031202793121338 6=0.399763822555542 7=0.40128111839294434 8=0.41541099548339844 9=0.41561365127563477 iteration=10 sum time=4.108263254165649 mean time=0.41082632541656494 min time=0.38555479049682617 max time=0.46947169303894043

My inference python script is as below.

model_path = 'model/23.12.13_exhib_bags_onnx_cpu/model.xml'
#device = 'CPU'
#device = 'AUTO'
device = 'GPU'
#device = 'BATCH:GPU(1)'
#device = 'MULTI:CPU,GPU.0'

score_thr = 0.9

core = ov.Core()

#core.set_property({'CACHE_DIR': '../cache'})

ov_model = model_path

print(f'avaliable device={core.available_devices}')

print(f'device={device}')

compiled_model = core.compile_model(ov_model, device)

image_paths = glob.glob(os.path.join("all-10", "*.jpg"))

pro_time = np.array([0]*10,dtype=float)
i = 0
for image_path in image_paths:

    start_time = time.time()

    image = cv2.imread(image_path)

    size = adjust_size(image.shape[:2])

    tmpim = cv2.resize(image, size)

    tmpim = tmpim.astype(np.float32).transpose(2, 0, 1)

    results = compiled_model(tmpim)

    pred_boxes = results[0]
    pred_classes = results[1]               
    pred_masks = results[2]
    scores = results[3]

    pro_time[i] = time.time() - start_time

    print(f'{i}={pro_time[i]}')
    i += 1

===

YOODS-Xu commented 10 months ago

My detectron2 instance segmentation model xml file is as below. image

I don't know why there were waves in my task manager GPU graph.

Iffa-Intel commented 9 months ago

@YOODS-Xu you shoud be able to run the notebook 123-detectron2-to-openvino.ipynb with GPU on Iris Xe

1__

Since you are using GPU to do the inference, it's expected to see waves in your GPU consumption graph.

My GPU consumption when inferencing is not high (~11%) plus, there might be some other stuff running in the background:

image
YOODS-Xu commented 9 months ago

@Iffa-Intel It is very kind of you gaving your experimental result. Could you please teach me your process time in gpu and cpu mode? Which is faster? I found your cpu is Tigerlake (Gen12), does it mean your gpu is Iris Xe graphics?

Iffa-Intel commented 9 months ago

@YOODS-Xu Yes I'm using Iris Xe GPU with i7 CPU, OS Ubuntu 22.04.

What I observe is the GPU is faster (even the visible result on notebook compiled faster than the CPU)

CPU

CPUUU

GPU

GPUUU
YOODS-Xu commented 9 months ago

@Iffa-Intel Thank you so much for your detailed information. I did inference in my dual system, and in each of them cpu was faster than gpu. My hardware is Tigerlake 11 Gen corei7 1165G7 2.8GHz with gpu: Iris Xe Graphics. My os are Ubuntu20.04 and Windows10 pro 22H2 19045.3803(with display driver version: 31.0.101.4953). And my OpenVINO version is 2023.2.0.

Which should be the reason of cpu is 10x faster than gpu? 11Gen is too older or Ubuntu20.04 is too lower? Any information would be greatly appreciated.

Iffa-Intel commented 9 months ago

@YOODS-Xu, CPUs are general-purpose processors that are well-suited for tasks that require complex decision-making, sequential processing, and the ability to handle a wide range of instructions. They are optimized for tasks that require high single-threaded performance and are often used in general computing tasks.

On the other hand, GPUs are specialized processors designed for parallel processing, particularly for graphics rendering. They excel at handling large amounts of data in parallel and are highly efficient at tasks like rendering images, processing video, and performing certain types of mathematical calculations (such as those used in machine learning and scientific simulations).

While GPUs are highly efficient for parallel tasks, they may not perform as well as CPUs in tasks that are not easily parallelizable. Additionally, GPUs may have higher latency for certain types of operations compared to CPUs.

Here are some reasons why GPUs might seem slower in specific situations:

Task Dependency: Some tasks are inherently sequential and don't benefit from parallel processing, which is a strength of GPUs. In such cases, CPUs with strong single-threaded performance may outperform GPUs.

Algorithm Suitability:: Certain algorithms or computations may not be well-suited for parallelization. If an algorithm is not optimized for parallel processing and relies heavily on sequential execution, a CPU might perform better.

Latency-Sensitive Tasks:GPUs are optimized for throughput and parallelism but may introduce higher latency for certain operations. Tasks that are sensitive to latency rather than raw processing power might be better suited for CPUs.

Data Transfer Overhead: If there is a need to frequently transfer data between the CPU and GPU (e.g., for synchronization or communication between the two), the overhead of data transfer can impact overall performance.

Thread Divergence: In some parallel algorithms, threads may diverge, meaning they take different paths of execution. This can lead to inefficient use of GPU resources and impact performance.

Limited Cache Size: Some GPUs have smaller cache sizes compared to high-end CPUs. For workloads that heavily rely on caching, a smaller cache size may result in more frequent memory accesses, potentially impacting performance.

Instruction Set Differences: CPUs and GPUs may have different instruction sets optimized for different types of computations. If a workload is not well-matched to the GPU's architecture, it may not perform as well.

Single-Precision vs. Double-Precision Performance: GPUs often excel in single-precision floating-point operations but may have lower performance in double-precision calculations. Some scientific and engineering workloads heavily rely on double-precision, and in such cases, a CPU might be more suitable.

Task Overhead: GPU cores are optimized for handling a large number of simple calculations simultaneously. If the task involves significant overhead per calculation, the GPU's advantage in raw processing power may be mitigated.

This page might help you further.

YOODS-Xu commented 9 months ago

@Iffa-Intel It is very kind of you giving me so many information. I will learn your document.

Iffa-Intel commented 9 months ago

@YOODS-Xu If you don't have any other inquiries, shall I close this thread?

Iffa-Intel commented 9 months ago

Closing issue, feel free to re-open or start a new issue if additional assistance is needed.

YOODS-Xu commented 8 months ago

I am so sorry for the late reply. I fractured a bone and today is my first working day from the sick leave. Thank you very much for your support.