Closed ghost closed 2 years ago
Is this command working?
/opt/rocm/bin/hipcc ~/.cache/antares/cache/_/my_kernel.cc --genco -O2 --amdgpu-target=gfx1031 -Wno-ignored-attributes -o /tmp/out.hsaco
Have you installed with
pip3 install --upgrade antares
in wsl2? After that, antares cmd will be available in PATH. Secondly, you don't need to compile it from source. Just runBACKEND=c-rocm_win64 antares
for a trial. BTW, in wsl2, you need to install rocm compiler according to 'https://sep5.readthedocs.io/en/latest/Installation_Guide/Installation-Guide.html#performing-an-opencl-only-installation-of-rocm', making sure command/opt/rocm/bin/hipcc
is installed in wsl successfully.
These two packages dkms
rocm-dkms
is not needed for WSL. Please install the rest packages and finally ensure command /opt/rocm/bin/hipcc
can work.
That's great. Next you need to install an upgrade antares version.
pip3 install --upgrade antares==0.3.13.1 # If it fails, it means the PYPI repo is not up-to-date, please re-run this command until it succeed.
# Then, try show the output for this command:
AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares
gfx1031 is possibly not the corresponding spec name for your GPU.
Or your AMD driver for Windows is not up-to-date,
or not in the worst case, not supporting this spec.
Please guess and try other numbers like 1030, 1010, etc.
What is the gfx number for the vega one?
What is the full model name of Vega GPU?
Maybe you can temporarily disable the Vega GPU in windows device manager for this test.
Maybe you can temporarily disable the Vega GPU in windows device manager for this test.
?
Nop, rocminfo is not the suitable test command. I don't think the "new error" is related to our topic. After you disable Vega GPU in "Windows Device Manager", Please try:
antares clean
AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares
OK, since you don't know which GPU is the enabled one.
Firstly please try which of the following typical spec settings can work:
antares clean
AMDGFX=gfx803 BACKEND=c-rocm_win64 antares
AMDGFX=gfx900 BACKEND=c-rocm_win64 antares
AMDGFX=gfx902 BACKEND=c-rocm_win64 antares
AMDGFX=gfx906 BACKEND=c-rocm_win64 antares
AMDGFX=gfx908 BACKEND=c-rocm_win64 antares
AMDGFX=gfx1010 BACKEND=c-rocm_win64 antares
AMDGFX=gfx1030 BACKEND=c-rocm_win64 antares
Please open this file via vim:
vi ~/.local/lib/python3.8/site-packages/antares_core/backends/c-rocm_win64/include/backend.hpp
For the content, please fully replace the original init function with this updated function:
void init(int dev) {
ab::hLibDll = LoadLibrary(AMDHIP64_LIBRARY_PATH);
CHECK(hLibDll, "Cannot find `" AMDHIP64_LIBRARY_PATH "` !\n");
int gpu_count = -1;
LOAD_ONCE(hipGetDeviceCount, int (*)(int*));
CHECK(0 == hipGetDeviceCount(&gpu_count), "Failed to run hipGetDeviceCount().");
fprintf(stderr, "@@ hipGetDeviceCount = %d\n", gpu_count);
LOAD_ONCE(hipSetDevice, int (*)(int));
CHECK(0 == hipSetDevice(dev), "Failed initialize AMD ROCm device with `" AMDHIP64_LIBRARY_PATH "` (No AMDGPU installed or enabled?).");
_current_device = dev;
}
After saving, please re-run AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares
and show the output of logging, which will include whether Windows ROCm driver detects at least 1 GPU.
If you see @@ hipGetDeviceCount = 0
, it means the current Windows ROCm driver doesn't even recognize at least 1 GPU from two you have. If it is @@ hipGetDeviceCount = 1
, it means it is supported, but the gfx number is incorrect.
2 means both 2 gpu will be supported. (Vega 8 and RX6700). Thus, you need to link to correct GPU ID and correct GFX number:
Please re-open this file via vim:
vi ~/.local/lib/python3.8/site-packages/antares_core/backends/c-rocm_win64/include/backend.hpp
Similarly, for the content, please fully replace the original init function with this updated function:
void init(int dev) {
ab::hLibDll = LoadLibrary(AMDHIP64_LIBRARY_PATH);
CHECK(hLibDll, "Cannot find `" AMDHIP64_LIBRARY_PATH "` !\n");
LOAD_ONCE(hipSetDevice, int (*)(int));
CHECK(0 == hipSetDevice(1), "Failed initialize AMD ROCm device with `" AMDHIP64_LIBRARY_PATH "` (No AMDGPU installed or enabled?).");
_current_device = dev;
}
After saving, please re-run AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares
.
The main difference is that this will use the 2nd GPU for a trial.
How did you get "a hip error no binary"? The new change will in the wost case throw that error again. https://user-images.githubusercontent.com/95400651/164879095-88be9121-a4c3-4afb-8f94-491f5447adae.png
Can you re-update antares with "pip3 install antares --upgrade" (possibly multiple times if fails)?
Yes, it successful utilize RX6700 for computation.
This allows you execute __global__ function
based source code on windows. It means, you can use driver-level programming for Win64 execution to make ROCm kernels run efficiently just with native RX6700 driver for Windows.
Antares treats ROCm for Windows as a special-hardware backend, and can generate any IR-based efficient kernel to build up that __global__ function
, e.g. MatMul/Transpose/Conv/..
You need to compile the hip kernels in wsl since hipcc is from wsl only, After that, hipcc will produce HSACO binary code for AMDGPU, this file can be directly loaded by Win64 program and no need to use wsl.
Briefly, you need wsl to compile all hip kernels to many HSACO files, and then you can detach wsl and write clean Win64 host program
to interact with AMDGPU using these HSACOs.
Is this command working?
/opt/rocm/bin/hipcc ~/.cache/antares/cache/_/my_kernel.cc --genco -O2 --amdgpu-target=gfx1031 -Wno-ignored-attributes -o /tmp/out.hsaco
Hey, there are my problems:
And when I input this: It returns this:
Andinstalled, automatic this things: rocm-clang-ocl/focal,now 0.5.0.50401-84~20.04 amd64 [installed,automatic] rocm-cmake/focal,now 0.8.0.50401-84~20.04 amd64 [installed,automatic] rocm-core/focal,now 5.4.1.50401-84~20.04 amd64 [installed,automatic] rocm-dbgapi/focal,now 0.68.0.50401-84~20.04 amd64 [installed,automatic] rocm-debug-agent/focal,now 2.0.3.50401-84~20.04 amd64 [installed,automatic] rocm-dev/focal,now 5.4.1.50401-84~20.04 amd64 [installed] rocm-device-libs/focal,now 1.0.0.50401-84~20.04 amd64 [installed,automatic] rocm-dkms/focal,now 5.4.1.50401-84~20.04 amd64 [installed] rocm-gdb/focal,now 12.1.50401-84~20.04 amd64 [installed,automatic] rocm-llvm/focal,now 15.0.0.22465.50401-84~20.04 amd64 [installed,automatic] rocm-ocl-icd/focal,now 2.0.0.50401-84~20.04 amd64 [installed,automatic] rocm-opencl-dev/focal,now 2.0.0.50401-84~20.04 amd64 [installed] rocm-opencl/focal,now 2.0.0.50401-84~20.04 amd64 [installed,automatic] rocm-smi-lib/focal,now 5.0.0.50401-84~20.04 amd64 [installed,automatic] rocm-utils/focal,now 5.4.1.50401-84~20.04 amd64 [installed,automatic] rocminfo/focal,now 1.0.0.50401-84~20.04 amd64 [installed, automatic]
I succeed to use this:
Is this command working?
/opt/rocm/bin/hipcc ~/.cache/antares/cache/_/my_kernel.cc --genco -O2 --amdgpu-target=gfx1031 -Wno-ignored-attributes -o /tmp/out.hsaco
Ok, I use this:
And then this: It returns this:
Have you installed with
pip3 install --upgrade antares
in wsl2? After that, antares cmd will be available in PATH. Secondly, you don't need to compile it from source. Just runBACKEND=c-rocm_win64 antares
for a trial.BTW, in wsl2, you need to install rocm compiler according to 'https://sep5.readthedocs.io/en/latest/Installation_Guide/Installation-Guide.html#performing-an-opencl-only-installation-of-rocm', making sure command
/opt/rocm/bin/hipcc
is installed in wsl successfully.