microsoft / antares

Antares: an automatic engine for multi-platform kernel generation and optimization. Supporting CPU, CUDA, ROCm, DirectX12, GraphCore, SYCL for CPU/GPU, OpenCL for AMD/NVIDIA, Android CPU/GPU backends.
Other
449 stars 47 forks source link

- #345

Closed ghost closed 5 days ago

ghostplant commented 2 years ago

Thanks for you request. Recently we are busy in updating some other functionalities. Later we'll add corresponding doc that allows you to use Rocm program for Windows more easily.

ffleader1 commented 2 years ago

Second this. I think OP has a 6800M and can run Rocm on Windows. That is awesome.

ghostplant commented 2 years ago

Thanks for you request. Recently we are busy in updating some other functionalities. Later we'll add corresponding doc that allows you to use Rocm program for Windows more easily.

I understand, thank you for your progress.

You can try follow 2 steps in advance: Step-1: Generate Kernel Source

BACKEND=c-rocm_win64 COMPUTE_V1='- einstein_v2("output0[N] = input0[N] + input1[N]", input_dict={"input0": {"dtype": "float32", "shape": [1024 * 512]}, "input1": {"dtype": "float32", "shape": [1024 * 512]}})' antares save kernel0.hip.cc

Step-2: Build Source into HSACO

hipcc kernel0.hip.cc --amdgpu-target=gfx1031 --genco -O2 -o kernel0.hip.hsaco

Step-3: Write win64 programs without WSL to utilize kernel0.hip.hsaco (a little complex, will be updated later)

ghostplant commented 2 years ago

Thanks for your waiting. The feature is now supported. Please try commands below:

# Upgrade Antares version
$ pip3 install antares==0.3.15.1 --upgrade

# Your GFX1031 video card is not as the first rank of AMDGPUs, so you need this to select the correct GPU index.
$ export DEVICE_ID=1

# Save the code that you want to compile for Windows ROCm
$ AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares save ./amd_example.cpp

# Compile it into a clean folder: dest-outputs
$ AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares compile ./amd_example.cpp dest-outputs/

# You'll get a simple project that can be built by MINGW64 or VC++, which is no longer related to WSL.
$ cd dest-outputs/ && make

After that, two files `kernels.bin` and `main.exe` (put together) can be executed on Windows without GCC and WSL dependencies.
ghostplant commented 2 years ago

i'm trying to compile a vapoursynth filter that was made in cuda, it has the cuda or hip kernel kernel.hip.cpp and a vapoursynth wrapper source.cpp which needs some libraries that are in a "vapoursynth" folder, the end result will be a dll. If you wish i could upload the source code.

Please try this version: pip3 install antares==0.3.15.2. I think the problem is solved.

ghostplant commented 2 years ago

You have to use ROCm driver-level API to dispatch kernel workloads for Windows ROCm. Definitely, your source code is not based on that style. antares compile <kernel.cpp> <outdir> generates a minimum driver-level example that shows you on how to do this correctly. You have to follow that.

kernel.bin is the object that you should precompile bm3d function in wsl via hipcc --genco .., and for hip APIs like hipSetDevice/hipMalloc/.. in source.cpp, you should change them according to https://github.com/microsoft/antares/blob/v0.3.x/backends/c-rocm_win64/include/backend.hpp#L24-L25.