[Bug]: clEnqueueNDRangeKernel, error code: -54 when trying to run notebook 278 on GPU

openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

https://docs.openvino.ai

Apache License 2.0

6.33k stars 2.08k forks source link

[Bug]: clEnqueueNDRangeKernel, error code: -54 when trying to run notebook 278 on GPU #23449

Open clinty opened 3 months ago

clinty commented 3 months ago

OpenVINO Version

2024.0.0

Operating System

Other (Please specify in description)

Device used for inference

CPU

Framework

PyTorch

Model used

278-stable-diffusion-ip-adapter

Issue description

After selecting GPU inference in the 278-stable-diffusion-ip-adapter notebook, the second generation image variation cell fails with

RuntimeError: Exception from src/inference/src/cpp/infer_request.cpp:223:
Exception from src/plugins/intel_gpu/src/runtime/ocl/ocl_stream.cpp:310:
[GPU] clEnqueueNDRangeKernel, error code: -54

Step-by-step reproduction

Run the cells of the 278-stable-diffusion-ip-adapter openvino notebook, select GPU, and continue to run cells.

Relevant log output

No response

Issue submission checklist

[X] I'm reporting an issue. It's not a question.
[X] I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
[X] There is reproducer code and related data files such as images, videos, models, etc.

Wan-Intel commented 3 months ago

May I know which Operating System are you using when running the Image Generation with Stable Diffusion and IP-Adapter? Which Python version are you using on your machine? Does error occurs when running the inference with CPU plugin?

clinty commented 3 months ago

@Wan-Intel , this is on Debian testing. Python is 3.11.8. I have been unable to run it on CPU as the process runs out of memory and gets killed by the OOM killer.

Wan-Intel commented 3 months ago

I've validated the Image Generation with Stable Diffusion and IP-Adapter and the result is attached below:

May I know which CPU are you using to run the Image Generation with Stable Diffusion and IP-Adapter?

For your information, the supported CPU processor can be checked at OpenVINO™ System Requirements, and the supported Operating System to run OpenVINO™ Notebooks can be checked at OpenVINO™ Notebook System Requirements.

clinty commented 3 months ago

@Wan-Intel , I have now tried using Ubuntu 22.04 LTS (64 bit) on an i5-8365U and on a i7-10510U. On both machines the errors are the same as I get on Debian. With CPU Python runs out of memory and dies. With GPU I get clEnqueueNDRangeKernel, error code: -54.

Wan-Intel commented 3 months ago

Intel® Core™ i5-8365U Processor and Intel® Core™ i7-10510U Processor are supported for using OpenVINO™.

Could you please re-install the latest OpenVINO™ Notebook with the installation guide from here? Referring to this StackOverflow thread, please check and reduce local memory size, local group size, constant memory, and kernel arguments size.

clinty commented 3 months ago

Hello @Wan-Intel , I have re-installed the latest OpenVINO™ Notebook. How do I reduce local memory size, local group size, constant memory, and kernel arguments size?

Wan-Intel commented 3 months ago

I'm able to run the inference successfully when using CPU plugin.

When I select GPU plugin as inference device and run the following lines:

generator = torch.Generator(device="cpu").manual_seed(576)

image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/load_neg_embed.png")

result = ov_pipe(prompt='', ip_adapter_image=image, gaidance_scale=1, negative_prompt="", num_inference_steps=4, generator=generator)

fig = visualize_results([image, result.images[0]], ["input image", "result"])

I encountered the following errors: RuntimeError: [GPU] Exceed max size of memory allocation: Required 68161536 bytes, already occupied: 7603245268 bytes, but available memory size is 7616372736 bytes.

Let me check with relevant team and we'll update you as soon as possible.

clinty commented 2 months ago

I also get the -54 error with the instant-id notebook.

avitial commented 1 month ago

@clinty some demonstrated models can require at least 32GB RAM for conversion and running as stated in the notebook's description, this applies to both notebooks (278-stable-diffusion-ip-adapter and 286-instant-id) you have tried.

So the error may be caused by lack of memory and the process running out of memory. How much RAM is found on your system? Make sure there is enough RAM available, otherwise the process will be killed by the Linux kernel which is expected.

clinty commented 1 month ago

@avitial the system only has 32GB RAM. Could the error message be improved?