microsoft / antares

Antares: an automatic engine for multi-platform kernel generation and optimization. Supporting CPU, CUDA, ROCm, DirectX12, GraphCore, SYCL for CPU/GPU, OpenCL for AMD/NVIDIA, Android CPU/GPU backends.
Other
449 stars 46 forks source link

- #342

Closed ghost closed 2 years ago

ghostplant commented 2 years ago

Have you installed with pip3 install --upgrade antares in wsl2? After that, antares cmd will be available in PATH. Secondly, you don't need to compile it from source. Just run BACKEND=c-rocm_win64 antares for a trial.

BTW, in wsl2, you need to install rocm compiler according to 'https://sep5.readthedocs.io/en/latest/Installation_Guide/Installation-Guide.html#performing-an-opencl-only-installation-of-rocm', making sure command /opt/rocm/bin/hipcc is installed in wsl successfully.

ghostplant commented 2 years ago

Is this command working?

/opt/rocm/bin/hipcc ~/.cache/antares/cache/_/my_kernel.cc --genco -O2 --amdgpu-target=gfx1031 -Wno-ignored-attributes -o /tmp/out.hsaco
ghostplant commented 2 years ago

Have you installed with pip3 install --upgrade antares in wsl2? After that, antares cmd will be available in PATH. Secondly, you don't need to compile it from source. Just run BACKEND=c-rocm_win64 antares for a trial. BTW, in wsl2, you need to install rocm compiler according to 'https://sep5.readthedocs.io/en/latest/Installation_Guide/Installation-Guide.html#performing-an-opencl-only-installation-of-rocm', making sure command /opt/rocm/bin/hipcc is installed in wsl successfully.

imagen

These two packages dkms rocm-dkms is not needed for WSL. Please install the rest packages and finally ensure command /opt/rocm/bin/hipcc can work.

ghostplant commented 2 years ago

That's great. Next you need to install an upgrade antares version.

pip3 install --upgrade antares==0.3.13.1 # If it fails, it means the PYPI repo is not up-to-date, please re-run this command until it succeed.

# Then, try show the output for this command:
AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares
ghostplant commented 2 years ago

gfx1031 is possibly not the corresponding spec name for your GPU.

Or your AMD driver for Windows is not up-to-date,

or not in the worst case, not supporting this spec.

ghostplant commented 2 years ago

Please guess and try other numbers like 1030, 1010, etc.

ghostplant commented 2 years ago

What is the gfx number for the vega one?

ghostplant commented 2 years ago

What is the full model name of Vega GPU?

ghostplant commented 2 years ago

Maybe you can temporarily disable the Vega GPU in windows device manager for this test.

ghostplant commented 2 years ago

Maybe you can temporarily disable the Vega GPU in windows device manager for this test.

imagen ?

Nop, rocminfo is not the suitable test command. I don't think the "new error" is related to our topic. After you disable Vega GPU in "Windows Device Manager", Please try:

antares clean
AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares
ghostplant commented 2 years ago

OK, since you don't know which GPU is the enabled one.

Firstly please try which of the following typical spec settings can work:

antares clean
AMDGFX=gfx803 BACKEND=c-rocm_win64 antares
AMDGFX=gfx900 BACKEND=c-rocm_win64 antares
AMDGFX=gfx902 BACKEND=c-rocm_win64 antares
AMDGFX=gfx906 BACKEND=c-rocm_win64 antares
AMDGFX=gfx908 BACKEND=c-rocm_win64 antares
AMDGFX=gfx1010 BACKEND=c-rocm_win64 antares
AMDGFX=gfx1030 BACKEND=c-rocm_win64 antares
ghostplant commented 2 years ago

Please open this file via vim:

vi ~/.local/lib/python3.8/site-packages/antares_core/backends/c-rocm_win64/include/backend.hpp

For the content, please fully replace the original init function with this updated function:

  void init(int dev) {
    ab::hLibDll = LoadLibrary(AMDHIP64_LIBRARY_PATH);
    CHECK(hLibDll, "Cannot find `" AMDHIP64_LIBRARY_PATH "` !\n");

    int gpu_count = -1;
    LOAD_ONCE(hipGetDeviceCount, int (*)(int*));
    CHECK(0 == hipGetDeviceCount(&gpu_count), "Failed to run hipGetDeviceCount().");
    fprintf(stderr, "@@ hipGetDeviceCount = %d\n", gpu_count);

    LOAD_ONCE(hipSetDevice, int (*)(int));
    CHECK(0 == hipSetDevice(dev), "Failed initialize AMD ROCm device with `" AMDHIP64_LIBRARY_PATH "` (No AMDGPU installed or enabled?).");
    _current_device = dev;
  }

After saving, please re-run AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares and show the output of logging, which will include whether Windows ROCm driver detects at least 1 GPU.

ghostplant commented 2 years ago

If you see @@ hipGetDeviceCount = 0, it means the current Windows ROCm driver doesn't even recognize at least 1 GPU from two you have. If it is @@ hipGetDeviceCount = 1, it means it is supported, but the gfx number is incorrect.

ghostplant commented 2 years ago

2 means both 2 gpu will be supported. (Vega 8 and RX6700). Thus, you need to link to correct GPU ID and correct GFX number:

Please re-open this file via vim:

vi ~/.local/lib/python3.8/site-packages/antares_core/backends/c-rocm_win64/include/backend.hpp

Similarly, for the content, please fully replace the original init function with this updated function:

  void init(int dev) {
    ab::hLibDll = LoadLibrary(AMDHIP64_LIBRARY_PATH);
    CHECK(hLibDll, "Cannot find `" AMDHIP64_LIBRARY_PATH "` !\n");

    LOAD_ONCE(hipSetDevice, int (*)(int));
    CHECK(0 == hipSetDevice(1), "Failed initialize AMD ROCm device with `" AMDHIP64_LIBRARY_PATH "` (No AMDGPU installed or enabled?).");
    _current_device = dev;
  }

After saving, please re-run AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares.

The main difference is that this will use the 2nd GPU for a trial.

ghostplant commented 2 years ago

How did you get "a hip error no binary"? The new change will in the wost case throw that error again. https://user-images.githubusercontent.com/95400651/164879095-88be9121-a4c3-4afb-8f94-491f5447adae.png

Can you re-update antares with "pip3 install antares --upgrade" (possibly multiple times if fails)?

ghostplant commented 2 years ago

Yes, it successful utilize RX6700 for computation.

ghostplant commented 2 years ago

This allows you execute __global__ function based source code on windows. It means, you can use driver-level programming for Win64 execution to make ROCm kernels run efficiently just with native RX6700 driver for Windows.

Antares treats ROCm for Windows as a special-hardware backend, and can generate any IR-based efficient kernel to build up that __global__ function, e.g. MatMul/Transpose/Conv/..

ghostplant commented 2 years ago

You need to compile the hip kernels in wsl since hipcc is from wsl only, After that, hipcc will produce HSACO binary code for AMDGPU, this file can be directly loaded by Win64 program and no need to use wsl.

Briefly, you need wsl to compile all hip kernels to many HSACO files, and then you can detach wsl and write clean Win64 host program to interact with AMDGPU using these HSACOs.

Looong01 commented 1 year ago

Is this command working?

/opt/rocm/bin/hipcc ~/.cache/antares/cache/_/my_kernel.cc --genco -O2 --amdgpu-target=gfx1031 -Wno-ignored-attributes -o /tmp/out.hsaco

Hey, there are my problems: image

And when I input this: image It returns this: image

Andinstalled, automatic this things: rocm-clang-ocl/focal,now 0.5.0.50401-84~20.04 amd64 [installed,automatic] rocm-cmake/focal,now 0.8.0.50401-84~20.04 amd64 [installed,automatic] rocm-core/focal,now 5.4.1.50401-84~20.04 amd64 [installed,automatic] rocm-dbgapi/focal,now 0.68.0.50401-84~20.04 amd64 [installed,automatic] rocm-debug-agent/focal,now 2.0.3.50401-84~20.04 amd64 [installed,automatic] rocm-dev/focal,now 5.4.1.50401-84~20.04 amd64 [installed] rocm-device-libs/focal,now 1.0.0.50401-84~20.04 amd64 [installed,automatic] rocm-dkms/focal,now 5.4.1.50401-84~20.04 amd64 [installed] rocm-gdb/focal,now 12.1.50401-84~20.04 amd64 [installed,automatic] rocm-llvm/focal,now 15.0.0.22465.50401-84~20.04 amd64 [installed,automatic] rocm-ocl-icd/focal,now 2.0.0.50401-84~20.04 amd64 [installed,automatic] rocm-opencl-dev/focal,now 2.0.0.50401-84~20.04 amd64 [installed] rocm-opencl/focal,now 2.0.0.50401-84~20.04 amd64 [installed,automatic] rocm-smi-lib/focal,now 5.0.0.50401-84~20.04 amd64 [installed,automatic] rocm-utils/focal,now 5.4.1.50401-84~20.04 amd64 [installed,automatic] rocminfo/focal,now 1.0.0.50401-84~20.04 amd64 [installed, automatic]

I succeed to use this: image

Looong01 commented 1 year ago

Is this command working?

/opt/rocm/bin/hipcc ~/.cache/antares/cache/_/my_kernel.cc --genco -O2 --amdgpu-target=gfx1031 -Wno-ignored-attributes -o /tmp/out.hsaco

Ok, I use this: image

And then this: image It returns this: image