Closed olauq closed 7 years ago
And, the kernel is also incorrect since it returns at the start if the index is out of range. However, these all the threads in the block are needed to load the data into shared memory. Instead the code should just mask those operations that have side effects such as loading data and storing the result.
Thanks for reporting this issue. I fixed it in this commit: faa46de0b3373870ef8831c5f9793d23800d4bba Notice though that this feature is just for testing/comparing the performance difference between an O(N*N) algorithm and the O(NlogN) and should not be used for production simulations.
Unless I'm mistaken, in file dev_direct_gravity.cu, function dev_direct_gravity:
int numTiles = n / p;
should beint numTiles = (n + p - 1) / p;
Otherwise forces from the last, non-full, tile are not processed.