lattice / quda

QUDA is a library for performing calculations in lattice QCD on GPUs.
https://lattice.github.io/quda
Other
286 stars 94 forks source link

clover dslash_test broken with t partitioning and reconstruction #130

Closed maddyscientist closed 10 years ago

maddyscientist commented 10 years ago

(paraphrasing message from Alejandro)

Multi-gpu clover is broken for link reconstruction different than 18 in quda-0.7 and tifr-redux branches (therefore in tmclover-quda as well). I don't know why, but I feel this is related to the way the extended gauge fields are constructed, I think these only work in recon 18, so when creating the extended fields from the original fields with recon 12 or 8, something must go wrong for multi-gpu.

mpiexec -np 2 ./dslash_test --dslash_type clover --Lsdim 1 --test 2 --tune false --sdim 24 --tdim 24 --tgridsize 2 --prec double --recon 18 --load 0000 --dagger

and

mpiexec -np 2 ./dslash_test --dslash_type clover --Lsdim 1 --test 2 --tune false --sdim 24 --tdim 24 --tgridsize 2 --prec double --recon 12 --load 0000 --dagger

I'm reading spinor and conf from disk, so I always study the same case. This two runs give me very different result, and if I use recon 8, it's even worse.

maddyscientist commented 10 years ago

I cannot reproduce this failure with the current quda-0.7 branch. It's possible this bug has been fixed already accidentally.

Alejandro, can you try to reproduce this, to see if it's still present?

The only issue I have managed to reproduce is if the gauge field that is loaded is only stored in single precision, then reconstruction will deviate at single precision accuracy. The solution to this is to reproject the gauge field on the SU(3) manifold, but that's something else entirely.

AlexVaq commented 10 years ago

I'll try to have a look at it asap, although with the lattice so close, everybody's going to be very busy.

Ciao,

Alex

El 30/05/2014, a las 01:17, mikeaclark notifications@github.com escribió:

I cannot reproduce this failure with the current quda-0.7 branch. It's possible this bug has been fixed already accidentally.

Alejandro, can you try to reproduce this, to see if it's still present?

The only issue I have managed to reproduce is if the gauge field that is loaded is only stored in single precision, then reconstruction will deviate at single precision accuracy. The solution to this is to reproject the gauge field on the SU(3) manifold, but that's something else entirely.

— Reply to this email directly or view it on GitHub https://github.com/lattice/quda/issues/130#issuecomment-44592649.

maddyscientist commented 10 years ago

This bug is now fixed with commit 00566bb25d8c3e98756d54c8f32626b92734dbb9.