Closed maddyscientist closed 12 years ago
This bug only appears when the local spatial sizes diifer, e.g., local volume of 16^3x64 is fine, but 16^2x8x64 fails. This means that likely a lattice dimension has likely been swapped accidentally.
yeah, there is that type of bug in staggered before. It turns out be something like X1 is used as X2 in the kernel core file.
Using multiple GPUs in anything other than the T dimension seems to get the wrong answer, and the wilson_dslash_test fails, e.g.,
This problem is present in the latest master commit a64abf95ba52eefab659 on CUDA 4.0, and is likely the same issue that Balint reported, hence is probably a bug introduced at around commit 99b16e1058ecfb3458e7.