Open chenyuntc opened 6 years ago
Hi, Nice Work. But I think there is actually no need for CUDA code -- they could be implemented vectorized, which is more intuitive. For example:
__global__ void InitDistanceMetricKernel could be implemented as
__global__ void InitDistanceMetricKernel
import torch as t I = t.range(0, H).view(-1,1,1,1) J = t.range(0,W).view(1,-1,1,1) P = t.range(0,H).view(1,1,-1,1) Q = t.range(0,W).view(1,1,1,-1) distanceMetric_data = t.exp( ((I-P)**2 + (J-Q)**2)) / (-2*e) ).view(H*w,-1)
Maybe incorrect, but you know what I mean.
Hi, Nice Work. But I think there is actually no need for CUDA code -- they could be implemented vectorized, which is more intuitive. For example:
__global__ void InitDistanceMetricKernel
could be implemented asMaybe incorrect, but you know what I mean.