lattice / quda

QUDA is a library for performing calculations in lattice QCD on GPUs.
https://lattice.github.io/quda
Other
279 stars 94 forks source link

Staggered invert test produces segmentation fault #4

Closed bjoo closed 13 years ago

bjoo commented 13 years ago

When building staggered in single GPU mode (create make.inc with

./configure --enable-os=linux --enable-gpu-arch=sm_20 --enable-staggered-dirac --disable-wilson-dirac --disable-domain-wall-dirac --disable-twisted-mass-dirac

on running the staggered invert test, I get a segfault on GTX480:

[bjoo@qcd10i2 tests]$ ./staggered_invert_test --prec double --recon 18 --test 3running the following test: prec sloppy_prec link_recon sloppy_link_recon test_type S_dimension T_dimension double double 18 18 mcg_even 24 24 QUDA: Found device 0: GeForce GTX 480 QUDA: Using device 0: GeForce GTX 480 Creating a DiracStaggeredPC operator Creating a DiracStaggeredPC operator Segmentation fault (core dumped)

MPI version runs fine.

gshi commented 13 years ago

I cannot reproduce the error. I tried two machines, one with C2050 and the other with GTX480. Both run fine.

[gshi@ac42 tests]$ ./staggered_invert_test --prec double --recon 18 --test 3 running the following test: prec sloppy_prec link_recon sloppy_link_recon test_type S_dimension T_dimension double double 18 18 mcg_even 24 24 QUDA: Found device 0: GeForce GTX 480 QUDA: Using device 0: GeForce GTX 480 Creating a DiracStaggeredPC operator Creating a DiracStaggeredPC operator Multimass CG: 1 iterations, r2 = 6.778324e+04 Multimass CG: 2 iterations, r2 = 2.427410e+04 Multimass CG: 3 iterations, r2 = 7.028485e+03 Multimass CG: 4 iterations, r2 = 2.078667e+03 Multimass CG: 5 iterations, r2 = 6.659763e+02 Multimass CG: 6 iterations, r2 = 2.340008e+02 Multimass CG: 7 iterations, r2 = 7.815442e+01 Multimass CG: 8 iterations, r2 = 2.600611e+01 Multimass CG: 9 iterations, r2 = 8.323934e+00 Multimass CG: 10 iterations, r2 = 2.629885e+00 Multimass CG: 11 iterations, r2 = 8.375718e-01 Multimass CG: 12 iterations, r2 = 2.697074e-01 Multimass CG: 13 iterations, r2 = 8.751358e-02 Multimass CG: 14 iterations, r2 = 2.861280e-02 Multimass CG: 15 iterations, r2 = 9.288566e-03 Multimass CG: 16 iterations, r2 = 3.012651e-03 Multimass CG: 17 iterations, r2 = 9.695625e-04 Multimass CG: 18 iterations, r2 = 3.090379e-04 Multimass CG: 19 iterations, r2 = 1.001718e-04 Multimass CG: 20 iterations, r2 = 3.197544e-05 Multimass CG: 21 iterations, r2 = 1.050283e-05 Multimass CG: 22 iterations, r2 = 3.399505e-06 Multimass CG: 23 iterations, r2 = 1.139047e-06 Multimass CG: 24 iterations, r2 = 3.659791e-07 Multimass CG: 25 iterations, r2 = 1.172472e-07 Multimass CG: 26 iterations, r2 = 3.830501e-08 Multimass CG: 27 iterations, r2 = 1.214379e-08 Multimass CG: 28 iterations, r2 = 3.910394e-09 Multimass CG: 29 iterations, r2 = 1.281671e-09 Multimass CG: 30 iterations, r2 = 4.212289e-10 Multimass CG: 31 iterations, r2 = 1.383202e-10 Multimass CG: 32 iterations, r2 = 4.458917e-11 Multimass CG: 33 iterations, r2 = 1.426002e-11 Converged after 33 iterations, r2 = 1.426002e-11, relative true_r2 = 4.300762e-17 Final residue squred =1.426e-11 done: total time = 0.33 secs, 33 iter / 0.274577 secs = 51.206 gflops, checking the solution 0th solution: mass=5.050000, relative residual, requested = 1e-08, actual = 2.29622e-16 1th solution: mass=1.230000, relative residual, requested = 1e-08, actual = 6.55802e-09 2th solution: mass=2.640000, relative residual, requested = 1e-08, actual = 2.85467e-16 3th solution: mass=2.330000, relative residual, requested = 1e-08, actual = 8.41762e-16

I am using master branch.

gshi commented 13 years ago

I used the configure option you used and verified the generated make.inc is indeed single GPU version for staggered.

gshi commented 13 years ago

I tried minvcg branch. It works for me too.

bjoo commented 13 years ago

I will test elsewhere, in case it is the old version of QUDA/Drivers on the machines I was looking at...

maddyscientist commented 13 years ago

Since this no longer seems to be an issue, i'm closing this.