quda 0.8.0 and milc 7.7.13 "ERROR: Solve precision ..."

lcosmai commented 8 years ago

I compiled the target su3_rhmc_hisq for ks_imp_rhmc in the last stable release of MILC (https://github.com/milc-qcd/milc_qcd.git Branch:master) with quda v0.8.0 (https://github.com/lattice/quda.git Branch:master) using the following Makefile: https://drive.google.com/file/d/0BxE4mI8SH7wsSnZEaDQyeEQzVVE/view?usp=sharing

Then I performed a short test using 1 GPU. The job aborted with the following error message: ERROR: Solve precision 4 doesn't match gauge precision 8 (rank 0, host node496, interface_quda.cpp:1904 in checkGauge()) last kernel called was (name=N4quda22HeavyQuarkResidualNormI7double37double2S2_EE,volume=4x8x8x8,aux=vol=2048,stride=2304,precision=8)

Note that if I instead use quda v0.7.2, the same test job is completed without errors.

mathiaswagner commented 8 years ago

Just to clarify:Which MILC version did you use.

I think (https://github.com/milc-qcd/milc_qcd.git Branch:master) corresponds to MILC 7.7.13, not 7.8.0 as mentioned in the bug title.

7.8.0 might well not be compatible with quda 0.8 as there have been quite a few changes affecting MILC.

lcosmai commented 8 years ago

You are right. I have just changed 7.8.0 to 7.7.13 in the title.

detar commented 8 years ago

Hi Mathias,

To provide a more stable definition of MILC code versions on github, last week we created two new branches, milc_qcd-7.7.13 and milc_qcd-7.8.0. They are supposed to be release versions of the code.
The branch 7.7.13 is closest to the one Leonardo was using, and the branch 7.8.0 is close to the current master branch. The master branch is the development branch, so it will continue to evolve. It is unlikely we will make any changes to the milc_qcd-7.7.13 and milc_qcd-7.8.0 branches unless they are to fix critical bugs.
Eventually, we will copy the master branch to a new release branch. (I think this is the model you also prefer.)

Best, Carleton

On 2/22/16 6:16 AM, Mathias Wagner wrote:

Just to clarify:Which MILC version did you use.

I think (https://github.com/milc-qcd/milc_qcd.git Branch:master) corresponds to MILC 7.7.13, not 7.8.0 as mentioned in the bug title.

7.8.0 might well not be compatible with quda 0.8 as there have been quite a few changes affecting MILC.

— Reply to this email directly or view it on GitHub https://github.com/milc-qcd/milc_qcd/issues/5#issuecomment-187168440.

mathiaswagner commented 8 years ago

Hi Carleton,

thanks for the correction. It looks like I was confused here. I will try to check QUDA 0.8 with

milc_qcd-7.7.13
milc_qcd-7.8.0

and try to reproduce the issue.

Sidenote: For QUDA we use a branch called develop for development and copy that over to a new release. We use master for the most recent release version (currently 0.8). This is to make sure that a git clone gives you a (hopefully) stable quda version.

Mathias

NVIDIA GmbH, Wuerselen, Germany, Amtsgericht Aachen, HRB 8361 Managing Director: Karen Theresa Burns

This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by

reply email and destroy all copies of the original message.

mathiaswagner commented 8 years ago

@lcosmai Can you share some more of the surrounding MILC output as well as your input file to MILC? This makes it easier to track down where the error was triggered.

mathiaswagner commented 8 years ago

@maddyscientist Not sure whether you are already reading so just wanted to make sure you are aware.

lcosmai commented 8 years ago

I shared (https://drive.google.com/folderview?id=0BxE4mI8SH7wsY2lianUyRDlCWG8&usp=sharing) the directory where the job has been launched. In the same directory there is also a README file with more details.

On 2/22/16 5:13 PM, Mathias Wagner wrote:

@lcosmai https://github.com/lcosmai Can you share some more of the surrounding MILC output as well as your input file to MILC? This makes it easier to track down where the error was triggered.

— Reply to this email directly or view it on GitHub https://github.com/milc-qcd/milc_qcd/issues/5#issuecomment-187248717.

Leonardo Cosmai INFN Bari Via Amendola 173 70126 Bari - Italy office: +39 080 5443207 mobile: +39 340 3580207

mathiaswagner commented 8 years ago

Ok. I managed to reproduce the issue by using the MILC provided sample input

~/milc_qcd/ks_imp_rhmc/test$ ../su3_rhmc_hisq su3_rhmc_hisq.2.sample-in

using quda 0.8 and MILC 7.7.13.

mathiaswagner commented 8 years ago

As this might be an issue either in MILC or in QUDA I also created https://github.com/lattice/quda/issues/439 to have a pointer in the QUDA issues tracker.

mathiaswagner commented 8 years ago

Setting

    prec_pbp 2

seems to be a workaround. Still need to check why this worked with quda 0.7.2.

mathiaswagner commented 8 years ago

This will be fixed with quda 0.8.1. For now please stick to the workaround and lattice/quda#439

milc-qcd / milc_qcd

quda 0.8.0 and milc 7.7.13 "ERROR: Solve precision ..." #5

reply email and destroy all copies of the original message.