akshit397a commented 1 day ago

System Info

───────────────────────────────────────────────────────────────────────────────────────┐ │ SYSTEMINFORMATION Version: 5.23.5 │ └─────────────────────────────────────────────────────────────────────────────────────────┘

Operating System: ────────────────────────────────────────────────────────────────────────────────────────── Platform : Windows Distro : Microsoft Windows 11 Home Single Language Release : 10.0.22631 Codename : Kernel : 10.0.22631 Arch : x64 Hostname : DESKTOP-FFR0VG0 Codepage : 437 Build : 22631 Hypervisor : true RemoteSession :

System: ────────────────────────────────────────────────────────────────────────────────────────── Manufacturer : Dell Inc. Model : Inspiron 15 3525 Version : 1.19.0 Virtual :

CPU: ────────────────────────────────────────────────────────────────────────────────────────── Manufacturer : AMD Brand : Ryzen 5 5500U with Radeon Graphics Family : 23 Model : 104 Stepping : 1 Speed : 2.1 Cores : 12 PhysicalCores : 6 PerformanceCores : 12 EfficiencyCores : Processors : 1 Socket : None

Who can help?

No response

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

include <torch/extension.h>

include <ATen/ATen.h>

include "cuda_launch.h"

include

std::vector index_max( at::Tensor index_vals, at::Tensor indices, int A_num_block, int B_num_block ) { return index_max_kernel( index_vals, indices, A_num_block, B_num_block ); }

at::Tensor mm_to_sparse( at::Tensor dense_A, at::Tensor dense_B, at::Tensor indices ) { return mm_to_sparse_kernel( dense_A, dense_B, indices ); }

at::Tensor sparse_dense_mm( at::Tensor sparse_A, at::Tensor indices, at::Tensor dense_B, int A_num_block ) { return sparse_dense_mm_kernel( sparse_A, indices, dense_B, A_num_block ); }

at::Tensor reduce_sum( at::Tensor sparse_A, at::Tensor indices, int A_num_block, int B_num_block ) { return reduce_sum_kernel( sparse_A, indices, A_num_block, B_num_block ); }

at::Tensor scatter( at::Tensor dense_A, at::Tensor indices, int B_num_block ) { return scatter_kernel( dense_A, indices, B_num_block ); }

PYBIND11_MODULE(TORCH_EXTENSION_NAME, m) { m.def("index_max", &index_max, "index_max (CUDA)"); m.def("mm_to_sparse", &mm_to_sparse, "mm_to_sparse (CUDA)"); m.def("sparse_dense_mm", &sparse_dense_mm, "sparse_dense_mm (CUDA)"); m.def("reduce_sum", &reduce_sum, "reduce_sum (CUDA)"); m.def("scatter", &scatter, "scatter (CUDA)"); }

Expected behavior

In the code, the kernel functions such as index_max_kernel, mm_to_sparse_kernel, sparse_dense_mm_kernel, reduce_sum_kernel, and scatter_kernel are being invoked, but they are not defined or included.

Fix:

Implement these kernel functions in a separate .cu (CUDA) file.
Or include the definitions of these functions in the current file if they are small enough to be inline functions.

For example, you need to include something like this for each of the kernel calls:

at::Tensor index_max_kernel(at::Tensor index_vals, at::Tensor indices, int A_num_block, int B_num_block) { }

If these kernels are implemented in a CUDA file, include them with the proper headers and ensure that the functions are declared in the C++ extension file.

adv-11 commented 1 day ago

i have worked on this before, ill try taking it up