Open ferrarioa5 opened 1 month ago
There is a typo in the add_kernel routine in the CUDA file muladd.cu. I assumed that this kernel should compute the sum of two tensors, but it acually computes the multiplication:
__global__ void add_kernel(int numel, const float* a, const float* b, float* result) { int idx = blockIdx.x * blockDim.x + threadIdx.x; if (idx < numel) result[idx] = a[idx] * b[idx]; }
This bug can be tested by running the following python script:
import extension_cpp as ext import torch device = 'cuda' n=100 a = torch.rand(n).to(device) b = torch.rand(n).to(device) add2 = torch.zeros(n).to(device) add1=a+b ext.ops.myadd_out(a,b,add2) print(torch.equal(add1,add2))
The CPU implementation gives the correct result (with device="cpu" in the code above).
yes, I also hink this is an error.
There is a typo in the add_kernel routine in the CUDA file muladd.cu. I assumed that this kernel should compute the sum of two tensors, but it acually computes the multiplication:
This bug can be tested by running the following python script:
The CPU implementation gives the correct result (with device="cpu" in the code above).