iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.85k stars 614 forks source link

INTERNAL; CUDA driver error 'CUDA_ERROR_INVALID_VALUE' (1): invalid argument #14458

Open ShuaiShao93 opened 1 year ago

ShuaiShao93 commented 1 year ago

What happened?

Got error INTERNAL; CUDA driver error 'CUDA_ERROR_INVALID_VALUE' (1): invalid argument when running iree runtime

Steps to reproduce your issue

import numpy as np
import iree.runtime as ireert

input_batch = np.float32(np.random.rand(1, 224, 224, 3))
device = "cuda"

iree_device = ireert.get_device(device)
iree_input_batch = ireert.asdevicearray(iree_device, input_batch)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/shshao/.local/lib/python3.8/site-packages/iree/runtime/array_interop.py", line 216, in asdevicearray
    buffer_view = device.allocator.allocate_buffer_copy(
RuntimeError: Failed to allocate device visible buffer: main_checkout/runtime/src/iree/hal/drivers/cuda/cuda_allocator.c:336: INTERNAL; CUDA driver error 'CUDA_ERROR_INVALID_VALUE' (1): invalid argument

What component(s) does this issue relate to?

Runtime

Version information

$pip show iree-runtime
Name: iree-runtime
Version: 20230524.529

Additional context

Ubuntu20.04, Nvidia-driver 535, CUDA11.8, GPU RTX4000

ShuaiShao93 commented 1 year ago

Upgrading to release version fixed it. We should probably update these pages with pip install -f https://openxla.github.io/iree/pip-release-links.html: https://openxla.github.io/iree/guides/ml-frameworks/tensorflow/