Open q10 opened 2 months ago
@q10 , I believe we also need to integrate or install the Composable Kernel (CK) for the GenAI ops. Were you able to build with CK? If so, could you please share the steps you followed? I’m running into some issues and would greatly appreciate any guidance you can provide.
cc @jeffdaily, I’ve seen similar PR 2610 from you and thought you might have some insights as well.
@q10 , I believe we also need to integrate or install the Composable Kernel (CK) for the GenAI ops. Were you able to build with CK? If so, could you please share the steps you followed? I’m running into some issues and would greatly appreciate any guidance you can provide.
cc @jeffdaily, I’ve seen similar PR 2610 from you and thought you might have some insights as well.
Ah yes, thanks for the pointer on CK. This work has stalled a bit due to other priorities, but ROCm support for GenAi ops is a work in progress.
I had to install a new-enough CK to get your branch to build. I forgot to note down the commit hash that introduced the CK header file you need. And I had to apply this patch.
diff --git a/fbgemm_gpu/experimental/example/src/nccl_example.cpp b/fbgemm_gpu/experimental/example/src/nccl_example.cpp
index 12bd7201..921f0590 100644
--- a/fbgemm_gpu/experimental/example/src/nccl_example.cpp
+++ b/fbgemm_gpu/experimental/example/src/nccl_example.cpp
@@ -6,7 +6,11 @@
* LICENSE file in the root directory of this source tree.
*/
+#ifdef USE_ROCM
+#include <rccl/rccl.h>
+#else
#include <nccl.h>
+#endif
namespace fbgemm_gpu::experimental {
diff --git a/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_blockwise_gemm.hip b/fbgemm_gpu/experimental/gen_ai/src
/quantize/ck_extensions/fp8_blockwise_gemm.hip
index 17d46048..72445dda 100644
--- a/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_blockwise_gemm.hip
+++ b/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_blockwise_gemm.hip
@@ -12,7 +12,9 @@
#include <numeric>
#include <ATen/ATen.h>
-#include <c10/cuda/CUDAStream.h>
+// normally hipify does this substitution for us, but this file isn't hipified
+//#include <c10/cuda/CUDAStream.h>
+#include <ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h>
#include <torch/torch.h>
#if defined(USE_ROCM)
diff --git a/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_gemm.hip b/fbgemm_gpu/experimental/gen_ai/src/q
uantize/ck_extensions/fp8_rowwise_gemm.hip
index 3a117321..63072bcb 100644
--- a/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_gemm.hip
+++ b/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_rowwise_gemm.hip
@@ -15,7 +15,9 @@
#include <unordered_map>
#include <ATen/ATen.h>
-#include <c10/cuda/CUDAStream.h>
+// normally hipify does this substitution for us, but this file isn't hipified
+//#include <c10/cuda/CUDAStream.h>
+#include <ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h>
#include <torch/torch.h>
#if defined(USE_ROCM)
diff --git a/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_tensorwise_gemm.hip b/fbgemm_gpu/experimental/gen_ai/sr
c/quantize/ck_extensions/fp8_tensorwise_gemm.hip
index 6170675a..09a7947b 100644
--- a/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_tensorwise_gemm.hip
+++ b/fbgemm_gpu/experimental/gen_ai/src/quantize/ck_extensions/fp8_tensorwise_gemm.hip
@@ -12,7 +12,9 @@
#include <numeric>
#include <ATen/ATen.h>
-#include <c10/cuda/CUDAStream.h>
+// normally hipify does this substitution for us, but this file isn't hipified
+//#include <c10/cuda/CUDAStream.h>
+#include <ATen/hip/impl/HIPStreamMasqueradingAsCUDA.h>
#include <torch/torch.h>
#if defined(USE_ROCM)
✅ Deploy Preview for pytorch-fbgemm-docs ready!
Toggle QR Code...
Use your smartphone camera to open QR code link.
To edit notification comments on pull requests, go to your Netlify site configuration.