microsoft / DirectML

DirectML is a high-performance, hardware-accelerated DirectX 12 library for machine learning. DirectML provides GPU acceleration for common machine learning tasks across a broad range of supported hardware and drivers, including all DirectX 12-capable GPUs from vendors such as AMD, Intel, NVIDIA, and Qualcomm.
MIT License
2.15k stars 286 forks source link

torch_directml Unknown error -2147024809 on WSL2 #451

Open anggaaryas opened 1 year ago

anggaaryas commented 1 year ago

I am using wsl2 debian, trying to use directml.

i try to run simple code and get this error:

>>> import torch
>>> import torch_directml
>>> dml = torch_directml.device()
>>> tensor1 = torch.tensor([1]).to(dml)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: Unknown error -2147024809
>>>

i have installed directml Successfully installed torch-2.0.0 torch-directml-0.2.0.dev230426 torchvision-0.15.1

WSL version

WSL version: 1.2.5.0
Kernel version: 5.15.90.1
WSLg version: 1.0.51
MSRDC version: 1.2.3770
Direct3D version: 1.608.2-61064218
DXCore version: 10.0.25131.1002-220531-1700.rs-onecore-base2-hyp
Windows version: 10.0.22621.900

Python 3 version = Python 3.9.2

Display info image

Winver image

anggaaryas commented 1 year ago

this code too

>>> import torch_directml
>>> torch_directml.device_count()
1
>>> torch_directml.device_name(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/user/venv/lib/python3.9/site-packages/torch_directml/device.py", line 22, in device_name
    return torch_directml_native.get_device_name(device_id)
RuntimeError: Unknown error -2147024809
>>>
rmast commented 1 year ago

pip list

Package Version


brotlipy 0.7.0 certifi 2022.12.7 cffi 1.15.1 charset-normalizer 2.0.4 cmake 3.26.3 cryptography 39.0.1 filelock 3.12.0 idna 3.4 Jinja2 3.1.2 lit 16.0.3 MarkupSafe 2.1.2 mkl-fft 1.3.6 mkl-random 1.2.2 mkl-service 2.4.0 mpmath 1.3.0 networkx 3.1 numpy 1.24.3 nvidia-cublas-cu11 11.10.3.66 nvidia-cuda-cupti-cu11 11.7.101 nvidia-cuda-nvrtc-cu11 11.7.99 nvidia-cuda-runtime-cu11 11.7.99 nvidia-cudnn-cu11 8.5.0.96 nvidia-cufft-cu11 10.9.0.58 nvidia-curand-cu11 10.2.10.91 nvidia-cusolver-cu11 11.4.0.1 nvidia-cusparse-cu11 11.7.4.91 nvidia-nccl-cu11 2.14.3 nvidia-nvtx-cu11 11.7.91 Pillow 9.4.0 pip 23.0.1 pycparser 2.21 pyOpenSSL 23.0.0 PySocks 1.7.1 requests 2.29.0 setuptools 66.0.0 sympy 1.11.1 torch 2.0.0 torch-directml 0.2.0.dev230426 torchaudio 0.13.1 torchvision 0.15.1 triton 2.0.0 typing_extensions 4.5.0 urllib3 1.26.15 wheel 0.38.4

python3 trytorch.py: import torch import torch_directml dml = torch_directml.device() tensor1 = torch.tensor([1]).to(dml) # Note that dml is a variable, not a string! tensor2 = torch.tensor([2]).to(dml) dml_algebra = tensor1 + tensor2 dml_algebra.item()

[W dml_heap_allocator.cc:93] DML allocator out of memory! Traceback (most recent call last): File "/mnt/c/Users/nicor/trytorch.py", line 4, in tensor1 = torch.tensor([1]).to(dml) # Note that dml is a variable, not a string! RuntimeError: Unknown error -2147024882

Windows 11 WSL 2 Intel(R) Core(TM) i5-8500 CPU @ 3.00GHz 3.00 GHz 8GB RAM available, of which 1.3GB free at the moment of running.

I also don't understand why so many cuda-related libraries are installed on a intel APU-system.

rmast commented 1 year ago

So, whatever the architecture, the error messages should improve to know where to look.

rmast commented 1 year ago

The same test does declare tensors when running straight from Windows 11 without WSL2 on the same PC, so the processor must be supported.

levicki commented 1 year ago

-2147024809 == 0x80070057 == E_INVALIDARG in COM interface land, or converted to regular Windows error codes 0x00000057 == ERROR_INVALID_PARAMETER.

The other error is -2147024882 == 0x8007000E == E_OUTOFMEMORY (also known as 0x0000000E == ERROR_OUTOFMEMORY).

Adele101 commented 10 months ago

Hi everyone, thank you for submitting this issue. While I can't provide a timeline for resolution as the moment, please know that your feedback is valuable to us. We will follow up once we can review this issue.

kalinkrustev commented 10 months ago

I found a workaround whereby a second and subsequent calls involving .to(dml) work. So I copied and wrapped the first call in try except and it worked.

ParfaitRF commented 3 months ago

I found a workaround whereby a second and subsequent calls involving .to(dml) work. So I copied and wrapped the first call in try except and it worked.

Could you please describe your workaround please. the description given isn`t clear to me at least

kalinkrustev commented 3 months ago

I do not have the code anymore, but from what I remember is that it failed only for the first torch.tensor([1]).to(dml) call. Next calls work fine. So you can create one initial call doing the same in a try catch block, something like this:

try:
  torch.tensor([1]).to(dml)
except:
  pass
torch.tensor([1]).to(dml) # this and subsequent .to(dml) calls will work
ParfaitRF commented 3 months ago

I do not have the code anymore, but from what I remember is that it failed only for the first torch.tensor([1]).to(dml) call. Next calls work fine. So you can create one initial call doing the same in a try catch block, something like this:

try:
  torch.tensor([1]).to(dml)
except:
  pass
torch.tensor([1]).to(dml) # this and subsequent .to(dml) calls will work

I see, unfortunately that solution is not applicable in my case, but thanks for the swift response