Closed franzfranchetti closed 2 years ago
Rejected Changes. The code generated for MDDFT (see examples/library-cuda/mddft-cuda.g) isn't valid for the GPU. This is because the kernel splits the problem with an invalid number of threads...
void mddft3d_80x80x80(double Y, double X) { dim3 b788(4000, 1, 1), b789(4000, 1, 1), b790(4000, 1, 1), g1(16, 1, 1), g2(16, 1, 1), g3(16, 1, 1); ker_mddft3d_80x80x800<<<g1, b788>>>(X);
The threads (4000) is invalid (maximum = 1024).
first version of PrunedMDRConv generates CUDA code, ready for verification. Updated and optimized versions to follow.