lattice / quda

QUDA is a library for performing calculations in lattice QCD on GPUs.
https://lattice.github.io/quda
Other
279 stars 94 forks source link

QUDA with MILC #313

Closed Drink7 closed 9 years ago

Drink7 commented 9 years ago

I have some problem about the quda with MILC. I'm using the application in the MILC called ks_imp_rhmc,and the Makefile setting is below. //compiler CC = mpicc //Linker flag LD=mpicxx //QUDA option WANTQUDA = true WANT_GF_GPU = true

Other QUDA options are commented. But I still cannot run on the GPU,an executable is generated but it seems that it doesn't run on the GPU. I don't know some warning message mean,I see this two warning message in the log.

WARNING: Failed to determine NUMA affinity for device 0 (possibly not applicable) WARNING: Cache file not found. All kernels will be re-tuned (if tuning is enabled).

Can someone help me ? Thanks. (My quda version is 0.7.1)

AlexVaq commented 9 years ago

Well, these two warning messages look QUDA generated. What makes you think you're not running in the GPU? Did you try to log in the node you were running and call nvidia-smi to see the GPU workload?

Aldo take into accout that, if there is not cache file, the first time you run it might take a while...

El 10/7/2015, a las 7:03, Drink7 notifications@github.com escribió:

I have some problem about the quda with MILC. I'm using the application called ks_imp_rhmc,and the Makefile setting is below. //compiler CC = mpicc //Linker flag LD=mpicxx //QUDA option WANTQUDA = true WANT_GF_GPU = true

Other QUDA options are commented. But I still cannot run on the GPU,an executable is generated but it seems that it doesn't run on the GPU. I don't know some warning message mean,I see this two warning message in the log.

WARNING: Failed to determine NUMA affinity for device 0 (possibly not applicable) WARNING: Cache file not found. All kernels will be re-tuned (if tuning is enabled).

Can someone help me ? Thanks. (My quda version is 0.7.1)

— Reply to this email directly or view it on GitHub.

Drink7 commented 9 years ago

The GPU card I use is Tesla K40,and I called nvidia-smi to see the workload and got this picture. image

Because of the 0% GPU Util,I think the executable doesn't correctly run in the GPU.

AlexVaq commented 9 years ago

Which part of QUDA are you trying to use exactly? Did you enable VERBOSE output?

Drink7 commented 9 years ago

The application ks_imp_rhmc seems that trying to use HISQ fermion force and gauge tools,and I just use the configure file in the quda's directory named configure.milc.titan to build up QUDA. In the configure file it doesn't enable VERBOSE,I'll add it to configure file and build it again.

AlexVaq commented 9 years ago

It might give you some info, but I don’t guarantee anything. To tell you the truth, I’m not familiar at all with MILC code. However, I contributed to gauge tools... What are you using exactly of gauge tools?

Drink7 commented 9 years ago

In the MILC code,the readme file in the application ks_imp_rhmc said that measurements include plaquette...,so I add the flag to the configure file.But I'm not sure whether it will work with the executable ks_imp_rhmc generate.

mathiaswagner commented 9 years ago

Which version of MILC and QUDA do you use?

Can you share your MILC Makefile and QUDA make.inc ?

mathiaswagner commented 9 years ago

The NUMA affinity message is from QUDA. It is a known issue that might affect performance on some systems.

Drink7 commented 9 years ago

I use the latest version,version 7.7.11 of MILC and version 0.7.1 of QUDA. Here is my MILC Makefile http://codepad.org/2EaY2rJp QUDA make.inc http://codepad.org/9cG6fxmQ

mathiaswagner commented 9 years ago

Thanks. I will try to have a look later.

Did you try to run on of the test input files in the ks_imp_rhmc/test directory? Which binary exactly did you use in the ks_imp_rhmc directory? su3_rhmc_hisq ?

mathiaswagner commented 9 years ago

From my first look in your Makefile:

You only offload the gauge force to the GPU. Everything else is kept on the CPU. So that should explain why your GPU is idle most of the time. I assume the code is running on the GPU, it is just only the gauge force.

Things you can check to verify it is running on the GPU:

computeGaugeForceQuda Total time = 6.55167 secs
download     = 3.397669 secs (  51.9%), with       12 calls at 2.831391e+05 us per call
upload     = 1.160581 secs (  17.7%), with        6 calls at 1.934302e+05 us per call
init     = 0.042459 secs ( 0.648%), with       12 calls at 3.538250e+03 us per call
compute     = 1.926511 secs (  29.4%), with        6 calls at 3.210852e+05 us per call
free     = 0.020827 secs ( 0.318%), with        6 calls at 3.471167e+03 us per call
constant     = 0.003388 secs (0.0517%), with       12 calls at 2.823333e+02 us per call
total accounted       = 6.551435 secs (   100%)
total missing         = 0.000236 secs (0.0036%)
QUDA_MILC_INTERFACE: qudaGaugeForce (called)
QUDA_MILC_INTERFACE: qudaGaugeForce (return)
GFTIME:   time = 1.507170e+00 (Symanzik1_QUDA) mflops = 1.064487e+05

You might want to try to put the inversions also on the GPU by using WANT_FN_CG_GPU = true in the MILC Makefile.

If you still have troubles feel free to share your output file. To reduce its length you can change line 216 in the MILC Makefile to

CGPU += -DSET_QUDA_VERBOSE # -DSET_QUDA_SUMMARIZE
Drink7 commented 9 years ago

Yes,I've tried to run the test input file in the ks_imp_rhmc/test directory before,and I used the executable su3_rhmd_hisq with double precision.And then I called nvidia-smi and it showed the GPU information above.

So should I put the inversions for all the QUDA Options or just change this option? WANT_FN_CG_GPU = true

mathiaswagner commented 9 years ago

Well, your nvidia-smi output shows that the GPU is used. But with only the gauge force on the GPU the utilization is probably pretty low. That is what you see. How long does the execution take and what does QUDA print for computeGaugeForceQuda at the end of the run?

If you want to put the inversion on the GPU the WANT_FN_CG_GPU = true is sufficient but you may also set everything to true. Just give it a try.

detar commented 9 years ago

For the ks_imp_rhmc applications, you will need the full suite of HISQ evolution modules.

Perhaps the following example Makefile for ks_imp_rhmc would help

http://www.physics.utah.edu/~detar/milc/Makefile-Drink7

This is for a somewhat later version of the MILC code than 7.7.11, but the QUDA macros should still be OK.

On 7/10/2015 8:48 AM, Drink7 wrote:

Yes,I've tried to run the test input file in the ks_imp_rhmc/test directory before,and I used the executable su3_rhmd_hisq with double precision.And then I called nvidia-smi and it showed the GPU information above.

So should I put the inversions for all the QUDA Options or just change this option? WANT_FN_CG_GPU = true

— Reply to this email directly or view it on GitHub https://github.com/lattice/quda/issues/313#issuecomment-120427035.

Carleton DeTar Department of Physics and Astronomy University of Utah

Drink7 commented 9 years ago

OK. I'll try to set those option to true and see how they change the performance about GPU. And I'll try to relink QUDA with MILC later. I forgot to put the execution result into a log file,so I'll run the test input again and check the information you talked about.

Thank you very much for your help!

stevengottlieb commented 9 years ago

Are you part of the student cluster competition? If so, you should use the MILC tar all prepared for the competition.

Sent from my iPad

On Jul 11, 2015, at 3:35 AM, Drink7 notifications@github.com<mailto:notifications@github.com> wrote:

OK. I'll try to set those option to true and see how they change the performance about GPU. And I'll try to relink QUDA with MILC later. I forgot to put the execution result into a log file,so I'll run the test input again and check the information you talked about.

Thank you very much for your help.

— Reply to this email directly or view it on GitHubhttps://github.com/lattice/quda/issues/313#issuecomment-120592822.

Drink7 commented 9 years ago

Yes,I meet some problem when building QUDA with MILC and trying to ask for help. You mean all the application in MILC 7.7.11(like ks_imp_dyn,pure_gauge and others) or just ks_imp_rhmc in the MILC?

mathiaswagner commented 9 years ago

If this is part of the student cluster competition I would prefer to take the further support away from the QUDA bug tracker. I think QUDA performs as expected.

@stevengottlieb , @detar : Do you provide the support the student cluster competition?

detar commented 9 years ago

Could you please introduce yourself?

Are you part of the student cluster competition?

On 7/11/2015 8:35 AM, Drink7 wrote:

Yes,I meet some problem when building QUDA with MILC and trying to ask for help. You mean all the application in MILC 7.7.11(like ks_imp_dyn,pure_gauge and others) or just ks_imp_rhmc in the MILC?

— Reply to this email directly or view it on GitHub https://github.com/lattice/quda/issues/313#issuecomment-120626139.

stevengottlieb commented 9 years ago

Please use the google group set up for the student cluster competition, not the github developers list. I agree with Mathias Wagner that this discussion belongs elsewhere.

Read the instructions on the competition webpage for MILC that were recently updated. There is a specific tarball for the competition that has a restricted set of code. There are also more test cases.

I will no longer respond to github posts on this issue and will encourage others to do the same.

On Sat, 2015-07-11 at 14:35 +0000, Drink7 wrote:

Yes,I meet some problem when building QUDA with MILC and trying to ask for help. You mean all the application in MILC 7.7.11(like ks_imp_dyn,pure_gauge and others) or just ks_imp_rhmc in the MILC?

— Reply to this email directly or view it on GitHub.

Drink7 commented 9 years ago

I should use the google group to ask for help,not the github here. I'm sorry and I'll close this issue later.

mathiaswagner commented 9 years ago

Thanks for moving that to the right place.

@stevengottlieb , @detar : If anything comes up during the cluster competition that is QUDA related please feed it back here. Also If you want some of QUDA developers to sometimes have a look into the issues popping up in the student competition let us know.