lattice / quda

QUDA is a library for performing calculations in lattice QCD on GPUs.
https://lattice.github.io/quda
Other
279 stars 94 forks source link

MILC precision mismatch in rhmc (MILC 7.7.13 / quda 0.8.0) #439

Closed mathiaswagner closed 8 years ago

mathiaswagner commented 8 years ago

QUDA internal tracking of bug report by @lcosmai https://github.com/milc-qcd/milc_qcd/issues/5

MILC 7.7.13 and probably also 7.8.0 fails in RHMC with QUDA

To reproduce build MILC for double precision and enable all QUDA acceleration for HISQ:

run

~/milc_qcd/ks_imp_rhmc/test$ ../su3_rhmc_hisq su3_rhmc_hisq.2.sample-in

error message:

GACTION: 5.960903e+00
ERROR: Solve precision 4 doesn't match gauge precision 8 (rank 0, host nvsocal2, /home/mathias/qudagit/lib/interface_quda.cpp:1904 in checkGauge())
       last kernel called was (name=N4quda22HeavyQuarkResidualNormI7double37double2S2_EE,volume=3x6x6x6,aux=vol=648,stride=756,precision=8)
mathiaswagner commented 8 years ago

Some insight from gdb: breakpoint at the cbheckgauge call

#0  invertQuda (hp_x=0x2524b1b0, hp_b=0x2525a4c0, param=0x7fffffffcb90) at /home/mathias/qudagit/lib/interface_quda.cpp:2194
#1  0x00000000004698f0 in qudaInvert (external_precision=2, quda_precision=1, mass=0.0018, inv_args=..., target_residual=9.9999999999999995e-07, target_fermilab_residual=0, fatlink=0xce15300, 
    longlink=0xcecb710, tadpole=0.89000000000000001, source=0x2525a4c0, solution=0x2524b1b0, final_residual=0x7fffffffe270, final_fermilab_residual=0x7fffffffe278, num_iters=0x7fffffffe25c)
    at /home/mathias/qudagit/lib/milc_interface.cpp:995
#2  0x00000000004551e7 in ks_congrad_parity_gpu ()
#3  0x0000000000457fda in ks_congrad_field ()
#4  0x0000000000457061 in mat_invert_uml_field ()
#5  0x000000000042bc24 in f_meas_imp_field ()
#6  0x0000000000405da9 in main ()

Looking into the milc_interface invert call

(gdb) print invalidate_quda_gauge
$1 = false
(gdb) print create_quda_gauge
$2 = true

So at line 977 we do not load a gauge field. However, as the resident gauge field is double precision and the request is for a single precision inversion we hit the error later. Looks like we need to add another check for reloading the gauge field at least here and maybe also in other place.

mathiaswagner commented 8 years ago

The error occurs when calculating pbp. Forcing an invalidateGauge in qudaInvert fixes the issue. Not sure how we can best detect that this is needed.

maddyscientist commented 8 years ago

We could add a method to query the precision of the resident gauge field and invalidate if it doesn't match.

This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by

reply email and destroy all copies of the original message.

mathiaswagner commented 8 years ago

Closing this now.