lattice / quda

QUDA is a library for performing calculations in lattice QCD on GPUs.
https://lattice.github.io/quda
Other
286 stars 94 forks source link

NUMA binding #25

Closed gshi closed 12 years ago

gshi commented 13 years ago

QUDA should have the feature to conveniently bind to numa-optimized cpu/gpu. Here is my thought so far

*) we can add a separate c program to generate a numa mapping file. I already have such a C program and we can copy or modify it for this purpose. Maybe we can add a utility/ or tools/ directory for that

*) We can compile the tool and run it to generate the numa mapping file in "make tune". A recompile should automatically compile the numa info into the executable.

*) quda should work correctly without the previous step.

gshi commented 13 years ago

OK, I have added the numa support and added checks in configure.ac script to see if numa head file is available and compile the code accordingly. At this moment the numa is only set when set explicitly using qudaSetNumaConfig(). In MILC interface: I have

ifdef NUMA_CONFIG_FILE

qudaSetNumaConfig(NUMA_CONFIG_FILE);

endif

where NUMA_CONFIG_FILE is set in compile time (-DNUMA_CONFIG_FILE=....). If numa file is set but not readable or entries are bad format, then an warning is issued and the program continues.

The numa config file can be generated using tools/gpu_affinity_test.

I consider this issue done unless someone has suggestions.

gshi commented 12 years ago

In recent kernels, one can read /proc files to find out the affinity cpu cores, therefore no need to read in any config file. However, it does not work with old kernels. So far in the machines I tested, one can get affinity info in /proc in kernel 2.6.32, but not in 2.6.18. I am thinking removing all old code about numa and put in this new code. In old machines that do not contain the affinity info in /proc, I will print out a warning message. What do you guys think?

gshi commented 12 years ago

According to my search on kernel source code, the "cpulistaffinity" is added in 2.6.26, so that's the minimum kernel version required.

maddyscientist commented 12 years ago

If my memory serves me, I thought Ron had an idea on how to fix this as I saw that it didn't always seem to work (still got very variable performance).

gshi commented 12 years ago

I like this new way better because it figures it out automatically and eventually it should work in all systems as time goes.

maddyscientist commented 12 years ago

Guochun, I completely agree. Remove the old code, and use the built-in kernel facility for this.

gshi commented 12 years ago

code pushed in caacb2ab105239aee9de87fd18e6f84064df2e7b and d5364bfe4bf56c1aa8787380f6b426861b5703d0

The default is that quda will try to find affinity cpu cores and set to those cores according to the GPU device. This can be disabled using --disable-numa-affinity in test programs or calling disableNumaAffinityQuda() in C code.

maddyscientist commented 12 years ago

With the latest numa code in place I still see variable performance on my dual socket test machine. This was the case before as well. Any comments?