lattice / quda

QUDA is a library for performing calculations in lattice QCD on GPUs.
https://lattice.github.io/quda
Other
289 stars 97 forks source link

non-degenerate twisted mass dslash is broken #161

Closed alexstrel closed 9 years ago

alexstrel commented 9 years ago

non-degenerate dslash operator is currently broken, also a single GPU version . In particular, the single GPU dslash test fails for single precision and reconstruction 12 (while all other options looked ok). Up to now, I noticed that this bug was absent in commit 3bfcdccf76be7e4729fd8438361df5c108db8295.

maddyscientist commented 9 years ago

Can you use git bisect to work out where the bug happened?

alexstrel commented 9 years ago

I'll do this. Actually, it looked strange for me why rec 12 for single precision is effected, might be just a side effect of some tricky problem: reconstruction stuff was not changed for quite a long period of time...

maddyscientist commented 9 years ago

This is the bad commit bd97f59db86dcdc65dd20b00a3056b4ed5295f03

jpfoley commented 9 years ago

Looks like my bug then. I wonder if the DslashFaceBuffer functor properly implements the older dslashCuda function. That could be where the bug lies. I'll try to take a look tomorrow. Apologies Alexei.

maddyscientist commented 9 years ago

Some progress here. Different policies are used for different dslash types, in particular wilson, clover, staggered, improved staggered, twisted clover all use QUDA_DSLASH2 and domain wall, mobius and twisted mass all use QUDA_DSLASH. Switching twisted mass to use QUDA_DSLASH2 fixes the bug. So the bug looks like it is with the QudaDslashFaceBuffer policy.

Justin this may also be the source of your problem with fused mobius.

maddyscientist commented 9 years ago

Ok, I spoke too soon. Changing policy doesn't fix the issue, but I've found that the issue has a lack of reproducibility to it. 8 / 18 reconstruct sometimes break as well. This is a nasty one.

maddyscientist commented 9 years ago

Now I'm more confused than ever. Switching back to the old dslashCuda2, instead of using the new dslash policy class doesn't fix the issue. Continuing to investigate. Justin, at this rate, I would suggest that you don't waste your time working on this until I work more what's going on.

maddyscientist commented 9 years ago

Some more data before I drop into bed. The errors always occur on the first or last time slice. Changing the volume affects the correctness. Given the lack of reproducibility, I should probably do another git bisect to double check where the error originates.

jpfoley commented 9 years ago

Yuck. Another low-level error reaching up from QUDA's depths?

alexstrel commented 9 years ago

Ok, line 217 in dslash_twisted_mass.cu introduced the bug for non-deg dslash: argument in->Volume() must be replaced by bulk_threads and in->GhostFace() must be changed by ghost_threads. Let me check.

jpfoley commented 9 years ago

Mike, Alexei - thanks for tracking this down. It was a bug I introduced, and a rather obvious one at that. I was clearly in cut-and-paste mode when I introduced this bug.

maddyscientist commented 9 years ago

Does this bug explain why it worked on say 4^4 volume but broke on 8^4?

Sent from my iPhone

On Oct 24, 2014, at 8:49, "Justin Foley" notifications@github.com<mailto:notifications@github.com> wrote:

Mike, Alexei - thanks for tracking this down. It was a bug I introduced, and a rather obvious one at that. I was clearly in cut-and-paste mode when I introduced this bug.

Reply to this email directly or view it on GitHubhttps://github.com/lattice/quda/issues/161#issuecomment-60406522.


This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by

reply email and destroy all copies of the original message.

alexstrel commented 9 years ago

Asymmetric preconditioning looks ok. now I get problems when choose QUDA_MATPC_EVEN_EVEN. But this is a seperate issue, that could appear in some early commits.

maddyscientist commented 9 years ago

Ok, Alexei's patch (6450fba27f3deda778506f9bbfbac7be57753676) does fix this bug. It had the effect of doubling the number of threads running which was bound to do strange things.

maddyscientist commented 9 years ago

Do we need to file another issue then for QUDA_MATPC_EVEN_EVEN?

alexstrel commented 9 years ago

yes, it should be a separate issue, I agree. At least we have a working non-deg dslash now for the asymmetic case.

maddyscientist commented 9 years ago

I also just pushed the same fix twisted-clover fermions, which also got the same copy and paste error. In fact it looks like twisted_clover had this issue for a long time, and could explain the segmentation faults Alex was getting on single GPU twisted clover a long time ago.

(This issue was my copy and paste error not yours - I was the culprit who applied your policy to the twisted mass dslash.)

alexstrel commented 9 years ago

this bug affected 2-flavor dslash (that is why one needs to correctly adjust a number of threads for this particular case). twisted-clover does not support non-degenerate doublet as far as I know, so I fear the problem you mentioned might still exist.

maddyscientist commented 9 years ago

It's also possible the twisted clover bug got fixed by fixing another bug.

On the QUDA_MATPC_EVEN_EVEN issue, this problem only shoes up when solving a QUDA_MAT_SOLUTION and does not show up when doing QUDA_MATPC_SOLUTION. Also, the dslash_test passes. This suggests the problem is in the Dirac::reconstruct or Dirac::prepare functions.

AlexVaq commented 9 years ago

Now I have a bit of a break with the projects, I'll try to test the last release for correctness with twisted-clover and let you know what happens.

I'm consideing writing the CPU counterpart. It takes time, but once done, testing is much easier...

Ciao,

Alex

El 24/10/2014, a las 19:25, mikeaclark notifications@github.com escribió:

It's also possible the twisted clover bug got fixed by fixing another bug.

On the QUDA_MATPC_EVEN_EVEN issue, this problem only shoes up when solving a QUDA_MAT_SOLUTION and does not show up when doing QUDA_MATPC_SOLUTION. Also, the dslash_test passes. This suggests the problem is in the Dirac::reconstruct or Dirac::prepare functions.

— Reply to this email directly or view it on GitHub https://github.com/lattice/quda/issues/161#issuecomment-60420730.

maddyscientist commented 9 years ago

I've pushed some changes for twisted-mass and twisted-clover, but it appears that twisted-clover is presently broken (dslash_test and invert_test are not working). But I'm confused, since there is no twisted-clover CPU counterpart, so how either test ever work? Do you have to force the clover term to be unit somehow to do the test instead of the default which is to construct the clover term on the device, but this doesn't change the CPU comparison which is against a dslash with no clover term.

Anyway, the degenerate and non-degenerate twisted-mass dslash is now built in parallel for faster compilation time and I've added a flag "--flavor" for the dslash_test and invert_test for easy switching between the two.

I think I've also fixed the twisted-clover bugs (subject to it getting the correct answer). There were two issues there (allocating texture objects for non-allocated fields, and an issue related to using 8-/12-reconstruction gauge fields when constructing the clover field.

AlexVaq commented 9 years ago

For testing I used the tmLQCD package, which is the only one implementation of twisted-clover on CPUs, that I know. So the test was kind of useless. I mean, I used it as intermediate step, because if the clover coeficcient was small enough, twisted-clover should be like twisted mass; and for mu=0 you should get clover. But no, there is no test in QUDA for twisted-clover, I was using tmLQCD, which was a pain.

Ciao,

Alex

El 28/10/2014, a las 21:28, mikeaclark notifications@github.com escribió:

I've pushed some changes for twisted-mass and twisted-clover, but it appears that twisted-clover is presently broken (dslash_test and invert_test are not working). But I'm confused, since there is no twisted-clover CPU counterpart, so how either test ever work? Do you have to force the clover term to be unit somehow to do the test instead of the default which is to construct the clover term on the device, but this doesn't change the CPU comparison which is against a dslash with no clover term.

Anyway, the degenerate and non-degenerate twisted-mass dslash is now built in parallel for faster compilation time and I've added a flag "--flavor" for the dslash_test and invert_test for easy switching between the two.

I think I've also fixed the twisted-clover bugs (subject to it getting the correct answer). There were two issues there (allocating texture objects for non-allocated fields, and an issue related to using 8-/12-reconstruction gauge fields when constructing the clover field.

— Reply to this email directly or view it on GitHub https://github.com/lattice/quda/issues/161#issuecomment-60825629.

maddyscientist commented 9 years ago

Ok, that's reassuring then that I can stop trying to work out if I broke it then!