microsoft / antares

Antares: an automatic engine for multi-platform kernel generation and optimization. Supporting CPU, CUDA, ROCm, DirectX12, GraphCore, SYCL for CPU/GPU, OpenCL for AMD/NVIDIA, Android CPU/GPU backends.
Other
435 stars 45 forks source link

Usage with Rocm windows for hip code compilation and documentation #342

Closed LuisB79 closed 2 years ago

LuisB79 commented 2 years ago

First, where is the documentation?, after installation in wsl2 it told me the command antares didn't exist, second i have my kernel.hip.cpp and source .cpp files, how could i compile that?, do i need to install rocm to compile it for gfx 1031?, or can antares compile that?

ghostplant commented 2 years ago

Have you installed with pip3 install --upgrade antares in wsl2? After that, antares cmd will be available in PATH. Secondly, you don't need to compile it from source. Just run BACKEND=c-rocm_win64 antares for a trial.

BTW, in wsl2, you need to install rocm compiler according to 'https://sep5.readthedocs.io/en/latest/Installation_Guide/Installation-Guide.html#performing-an-opencl-only-installation-of-rocm', making sure command /opt/rocm/bin/hipcc is installed in wsl successfully.

LuisB79 commented 2 years ago

it seems restarting ubuntu in wls2 helped, the BACKEND=c-rocm_win64 antares gave me [Antares] Incorrect compute kernel from evaluator. I havent installed the rocm compiler, i will proceed to do so

ghostplant commented 2 years ago

Is this command working?

/opt/rocm/bin/hipcc ~/.cache/antares/cache/_/my_kernel.cc --genco -O2 --amdgpu-target=gfx1031 -Wno-ignored-attributes -o /tmp/out.hsaco
LuisB79 commented 2 years ago

Is this command working?

/opt/rocm/bin/hipcc ~/.cache/antares/cache/_/my_kernel.cc --genco -O2 --amdgpu-target=gfx1031 -Wno-ignored-attributes -o /tmp/out.hsaco

i haven't installed rocm yet, do i need to install a previous version?, or do i install the newest 5.1.1 which has support for gfx1030

LuisB79 commented 2 years ago

Have you installed with pip3 install --upgrade antares in wsl2? After that, antares cmd will be available in PATH. Secondly, you don't need to compile it from source. Just run BACKEND=c-rocm_win64 antares for a trial.

BTW, in wsl2, you need to install rocm compiler according to 'https://sep5.readthedocs.io/en/latest/Installation_Guide/Installation-Guide.html#performing-an-opencl-only-installation-of-rocm', making sure command /opt/rocm/bin/hipcc is installed in wsl successfully.

imagen

ghostplant commented 2 years ago

Have you installed with pip3 install --upgrade antares in wsl2? After that, antares cmd will be available in PATH. Secondly, you don't need to compile it from source. Just run BACKEND=c-rocm_win64 antares for a trial. BTW, in wsl2, you need to install rocm compiler according to 'https://sep5.readthedocs.io/en/latest/Installation_Guide/Installation-Guide.html#performing-an-opencl-only-installation-of-rocm', making sure command /opt/rocm/bin/hipcc is installed in wsl successfully.

imagen

These two packages dkms rocm-dkms is not needed for WSL. Please install the rest packages and finally ensure command /opt/rocm/bin/hipcc can work.

LuisB79 commented 2 years ago

ok i installed rocm-opencl-dev and rocm-dev afterthat i got this imagen

LuisB79 commented 2 years ago

/opt/rocm/bin/hipcc ~/.cache/antares/cache/_/my_kernel.cc --genco -O2 --amdgpu-target=gfx1031 -Wno-ignored-attributes -o /tmp/out.hsaco

imagen

am i supposed to get nothing?

ghostplant commented 2 years ago

That's great. Next you need to install an upgrade antares version.

pip3 install --upgrade antares==0.3.13.1 # If it fails, it means the PYPI repo is not up-to-date, please re-run this command until it succeed.

# Then, try show the output for this command:
AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares
LuisB79 commented 2 years ago

i have succesfully updated, however i got a hip error no binary for gpu :c imagen

That's great. Next you need to install an upgrade antares version.

pip3 install --upgrade antares==0.3.13.1 # If it fails, it means the PYPI repo is not up-to-date, please re-run this command until it succeed.

# Then, try show the output for this command:
AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares
ghostplant commented 2 years ago

gfx1031 is possibly not the corresponding spec name for your GPU.

Or your AMD driver for Windows is not up-to-date,

or not in the worst case, not supporting this spec.

LuisB79 commented 2 years ago

gfx1031 is possibly not the corresponding spec name for your GPU.

how can i know?

LuisB79 commented 2 years ago

My GPU is an RX 6800m which it's pretty much a rx6700 xt with tdp limit of 145, ubuntu said it was a gfx 1031 when i installed it

ghostplant commented 2 years ago

Please guess and try other numbers like 1030, 1010, etc.

LuisB79 commented 2 years ago

Please guest and try other numbers like 1030, 1010, etc.

i have 2 amd gpu doe, the rx 6800m and a vega one that is integrated, do i need to specify that?

ghostplant commented 2 years ago

What is the gfx number for the vega one?

LuisB79 commented 2 years ago

What is the gfx number for the vega one?

is there a command to check that?

ghostplant commented 2 years ago

What is the full model name of Vega GPU?

LuisB79 commented 2 years ago

it literally just says amd radeon tm graphics, according to the wiki its a vega 8 gpu

LuisB79 commented 2 years ago

the rx6800m says it's navi22 XTM

ghostplant commented 2 years ago

Maybe you can temporarily disable the Vega GPU in windows device manager for this test.

LuisB79 commented 2 years ago

Maybe you can temporarily disable the Vega GPU in windows device manager for this test.

i do not have a mux switch, i will try to select ubuntu or wsl to use the rx6800m

LuisB79 commented 2 years ago

imagen imagen new error?

LuisB79 commented 2 years ago

Maybe you can temporarily disable the Vega GPU in windows device manager for this test.

imagen ?

ghostplant commented 2 years ago

Maybe you can temporarily disable the Vega GPU in windows device manager for this test.

imagen ?

Nop, rocminfo is not the suitable test command. I don't think the "new error" is related to our topic. After you disable Vega GPU in "Windows Device Manager", Please try:

antares clean
AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares
LuisB79 commented 2 years ago

Maybe you can temporarily disable the Vega GPU in windows device manager for this test.

imagen ?

Nop, rocminfo is not the suitable test command. I don't think the "new error" is related to our topic. After you disable Vega GPU in "Windows Device Manager", Please try:

antares clean
AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares

i fear that if i disable it i won't have a screen input, since the vega 8 gpu is the one connected to the laptop display, not the rx6800m

ghostplant commented 2 years ago

OK, since you don't know which GPU is the enabled one.

Firstly please try which of the following typical spec settings can work:

antares clean
AMDGFX=gfx803 BACKEND=c-rocm_win64 antares
AMDGFX=gfx900 BACKEND=c-rocm_win64 antares
AMDGFX=gfx902 BACKEND=c-rocm_win64 antares
AMDGFX=gfx906 BACKEND=c-rocm_win64 antares
AMDGFX=gfx908 BACKEND=c-rocm_win64 antares
AMDGFX=gfx1010 BACKEND=c-rocm_win64 antares
AMDGFX=gfx1030 BACKEND=c-rocm_win64 antares
LuisB79 commented 2 years ago

OK, since you don't know which GPU is the enabled one.

Firstly please try which of the following typical spec settings can work:

antares clean
AMDGFX=gfx803 BACKEND=c-rocm_win64 antares
AMDGFX=gfx900 BACKEND=c-rocm_win64 antares
AMDGFX=gfx902 BACKEND=c-rocm_win64 antares
AMDGFX=gfx906 BACKEND=c-rocm_win64 antares
AMDGFX=gfx908 BACKEND=c-rocm_win64 antares
AMDGFX=gfx1010 BACKEND=c-rocm_win64 antares
AMDGFX=gfx1030 BACKEND=c-rocm_win64 antares

Nothing, not a single one worked

LuisB79 commented 2 years ago

OK, since you don't know which GPU is the enabled one.

Firstly please try which of the following typical spec settings can work:

antares clean
AMDGFX=gfx803 BACKEND=c-rocm_win64 antares
AMDGFX=gfx900 BACKEND=c-rocm_win64 antares
AMDGFX=gfx902 BACKEND=c-rocm_win64 antares
AMDGFX=gfx906 BACKEND=c-rocm_win64 antares
AMDGFX=gfx908 BACKEND=c-rocm_win64 antares
AMDGFX=gfx1010 BACKEND=c-rocm_win64 antares
AMDGFX=gfx1030 BACKEND=c-rocm_win64 antares

does it affect that i'm, in wsl2?

ghostplant commented 2 years ago

Please open this file via vim:

vi ~/.local/lib/python3.8/site-packages/antares_core/backends/c-rocm_win64/include/backend.hpp

For the content, please fully replace the original init function with this updated function:

  void init(int dev) {
    ab::hLibDll = LoadLibrary(AMDHIP64_LIBRARY_PATH);
    CHECK(hLibDll, "Cannot find `" AMDHIP64_LIBRARY_PATH "` !\n");

    int gpu_count = -1;
    LOAD_ONCE(hipGetDeviceCount, int (*)(int*));
    CHECK(0 == hipGetDeviceCount(&gpu_count), "Failed to run hipGetDeviceCount().");
    fprintf(stderr, "@@ hipGetDeviceCount = %d\n", gpu_count);

    LOAD_ONCE(hipSetDevice, int (*)(int));
    CHECK(0 == hipSetDevice(dev), "Failed initialize AMD ROCm device with `" AMDHIP64_LIBRARY_PATH "` (No AMDGPU installed or enabled?).");
    _current_device = dev;
  }

After saving, please re-run AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares and show the output of logging, which will include whether Windows ROCm driver detects at least 1 GPU.

ghostplant commented 2 years ago

If you see @@ hipGetDeviceCount = 0, it means the current Windows ROCm driver doesn't even recognize at least 1 GPU from two you have. If it is @@ hipGetDeviceCount = 1, it means it is supported, but the gfx number is incorrect.

LuisB79 commented 2 years ago

If you see @@ hipGetDeviceCount = 0, it means the current Windows ROCm driver doesn't even recognize at least 1 GPU from two you have. If it is @@ hipGetDeviceCount = 1, it means it is supported, but the gfx number is incorrect.

imagen what does 2 mean?

ghostplant commented 2 years ago

2 means both 2 gpu will be supported. (Vega 8 and RX6700). Thus, you need to link to correct GPU ID and correct GFX number:

Please re-open this file via vim:

vi ~/.local/lib/python3.8/site-packages/antares_core/backends/c-rocm_win64/include/backend.hpp

Similarly, for the content, please fully replace the original init function with this updated function:

  void init(int dev) {
    ab::hLibDll = LoadLibrary(AMDHIP64_LIBRARY_PATH);
    CHECK(hLibDll, "Cannot find `" AMDHIP64_LIBRARY_PATH "` !\n");

    LOAD_ONCE(hipSetDevice, int (*)(int));
    CHECK(0 == hipSetDevice(1), "Failed initialize AMD ROCm device with `" AMDHIP64_LIBRARY_PATH "` (No AMDGPU installed or enabled?).");
    _current_device = dev;
  }

After saving, please re-run AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares.

The main difference is that this will use the 2nd GPU for a trial.

LuisB79 commented 2 years ago

imagen i don't know if this helps but this is what the rx6800m says in the amd driver

LuisB79 commented 2 years ago

2 means both 2 gpu will be supported. (Vega 8 and RX6700). Thus, you need to link to correct GPU ID and correct GFX number:

Please re-open this file via vim:

vi ~/.local/lib/python3.8/site-packages/antares_core/backends/c-rocm_win64/include/backend.hpp

Similarly, for the content, please fully replace the original init function with this updated function:

  void init(int dev) {
    ab::hLibDll = LoadLibrary(AMDHIP64_LIBRARY_PATH);
    CHECK(hLibDll, "Cannot find `" AMDHIP64_LIBRARY_PATH "` !\n");

    LOAD_ONCE(hipSetDevice, int (*)(int));
    CHECK(0 == hipSetDevice(1), "Failed initialize AMD ROCm device with `" AMDHIP64_LIBRARY_PATH "` (No AMDGPU installed or enabled?).");
    _current_device = dev;
  }

After saving, please re-run AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares.

The main difference is that this will use the 2nd GPU for a trial.

imagen same error

ghostplant commented 2 years ago

How did you get "a hip error no binary"? The new change will in the wost case throw that error again. https://user-images.githubusercontent.com/95400651/164879095-88be9121-a4c3-4afb-8f94-491f5447adae.png

Can you re-update antares with "pip3 install antares --upgrade" (possibly multiple times if fails)?

LuisB79 commented 2 years ago

How did you get "a hip error no binary"? The new change will in the wost case throw that error again. https://user-images.githubusercontent.com/95400651/164879095-88be9121-a4c3-4afb-8f94-491f5447adae.png

Can you re-update antares with "pip3 install antares --upgrade" (possibly multiple times if fails)?

to get that error i did this

That's great. Next you need to install an upgrade antares version.

pip3 install --upgrade antares==0.3.13.1 # If it fails, it means the PYPI repo is not up-to-date, please re-run this command until it succeed.

# Then, try show the output for this command:
AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares
LuisB79 commented 2 years ago

How did you get "a hip error no binary"? The new change will in the wost case throw that error again. https://user-images.githubusercontent.com/95400651/164879095-88be9121-a4c3-4afb-8f94-491f5447adae.png

Can you re-update antares with "pip3 install antares --upgrade" (possibly multiple times if fails)?

i managed to recreate it, i did "pip3 install antares --upgrade", then i ran AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares, and i got this imagen

(ghostplant said:)OK, this is a good state, but the init function are reverted as well, you need to re-edit that into:

  void init(int dev) {
    ab::hLibDll = LoadLibrary(AMDHIP64_LIBRARY_PATH);
    CHECK(hLibDll, "Cannot find `" AMDHIP64_LIBRARY_PATH "` !\n");

    LOAD_ONCE(hipSetDevice, int (*)(int));
    CHECK(0 == hipSetDevice(1), "Failed initialize AMD ROCm device with `" AMDHIP64_LIBRARY_PATH "` (No AMDGPU installed or enabled?).");
    _current_device = dev;
  }
LuisB79 commented 2 years ago

doing that worked? imagen

ghostplant commented 2 years ago

Yes, it successful utilize RX6700 for computation.

LuisB79 commented 2 years ago

i have my kernel.hip.cpp and source .cpp files, how can i compile them? (i'm literally new to this)

LuisB79 commented 2 years ago

Yes, it successful utilize RX6700 for computation.

for example with rocm in ubuntu (no wsl) this would be the command

/opt/rocm/hip/bin/hipcc source.cpp kernel.hip.cpp -o libbm3dhip.so -shared -fPIC -std=c++17 -O3 -I/home/comp/vapoursynth/include -Wno-unused-result --offload-arch=gfx1031 $(/opt/rocm/hip/bin/hipconfig --cxx_config), what do i change to do it with antares?

LuisB79 commented 2 years ago

Yes, it successful utilize RX6700 for computation. imagen imagen

hip info even though it is compiled it fails

ghostplant commented 2 years ago

This allows you execute __global__ function based source code on windows. It means, you can use driver-level programming for Win64 execution to make ROCm kernels run efficiently just with native RX6700 driver for Windows.

Antares treats ROCm for Windows as a special-hardware backend, and can generate any IR-based efficient kernel to build up that __global__ function, e.g. MatMul/Transpose/Conv/..

LuisB79 commented 2 years ago

This allows you execute __global__ function based source code on windows. It means, you can use driver-level programming for Win64 execution to make ROCm kernels run efficiently just with native RX6700 driver for Windows.

Antares treats ROCm for Windows as a special-hardware backend, and can generate any IR-based efficient kernel to build up that __global__ function, e.g. MatMul/Transpose/Conv/..

Can native hip kernels be compiled in antares to run hip code in windows without the need of wsl?

ghostplant commented 2 years ago

You need to compile the hip kernels in wsl since hipcc is from wsl only, After that, hipcc will produce HSACO binary code for AMDGPU, this file can be directly loaded by Win64 program and no need to use wsl.

Briefly, you need wsl to compile all hip kernels to many HSACO files, and then you can detach wsl and write clean Win64 host program to interact with AMDGPU using these HSACOs.

LuisB79 commented 2 years ago

Please add more documentation.

Looong01 commented 1 year ago

Is this command working?

/opt/rocm/bin/hipcc ~/.cache/antares/cache/_/my_kernel.cc --genco -O2 --amdgpu-target=gfx1031 -Wno-ignored-attributes -o /tmp/out.hsaco

Hey, there are my problems: image

And when I input this: image It returns this: image

Andinstalled, automatic this things: rocm-clang-ocl/focal,now 0.5.0.50401-84~20.04 amd64 [installed,automatic] rocm-cmake/focal,now 0.8.0.50401-84~20.04 amd64 [installed,automatic] rocm-core/focal,now 5.4.1.50401-84~20.04 amd64 [installed,automatic] rocm-dbgapi/focal,now 0.68.0.50401-84~20.04 amd64 [installed,automatic] rocm-debug-agent/focal,now 2.0.3.50401-84~20.04 amd64 [installed,automatic] rocm-dev/focal,now 5.4.1.50401-84~20.04 amd64 [installed] rocm-device-libs/focal,now 1.0.0.50401-84~20.04 amd64 [installed,automatic] rocm-dkms/focal,now 5.4.1.50401-84~20.04 amd64 [installed] rocm-gdb/focal,now 12.1.50401-84~20.04 amd64 [installed,automatic] rocm-llvm/focal,now 15.0.0.22465.50401-84~20.04 amd64 [installed,automatic] rocm-ocl-icd/focal,now 2.0.0.50401-84~20.04 amd64 [installed,automatic] rocm-opencl-dev/focal,now 2.0.0.50401-84~20.04 amd64 [installed] rocm-opencl/focal,now 2.0.0.50401-84~20.04 amd64 [installed,automatic] rocm-smi-lib/focal,now 5.0.0.50401-84~20.04 amd64 [installed,automatic] rocm-utils/focal,now 5.4.1.50401-84~20.04 amd64 [installed,automatic] rocminfo/focal,now 1.0.0.50401-84~20.04 amd64 [installed, automatic]

I succeed to use this: image

Looong01 commented 1 year ago

Is this command working?

/opt/rocm/bin/hipcc ~/.cache/antares/cache/_/my_kernel.cc --genco -O2 --amdgpu-target=gfx1031 -Wno-ignored-attributes -o /tmp/out.hsaco

Ok, I use this: image

And then this: image It returns this: image