lattice / quda

QUDA is a library for performing calculations in lattice QCD on GPUs.
https://lattice.github.io/quda
Other
279 stars 94 forks source link

Reorganize tests #454

Open mathiaswagner opened 8 years ago

mathiaswagner commented 8 years ago

When building quda the tests directory uses a lot of disk space. The issue is that all test executables are statically linked against the quda library and so the disk usage in tests is approximately <size of quda> x <number of tests>. We might use a shared library or combine all tests in a single executable. As we plan to make more use of google tests and the google test framework provides an easy way to link multiple tests into one binary and execute only selected tests this is my preferred solution. For convenience we can provide some wrapper scripts to still be able to call the tests with the current command lines. Thoughts?

mathiaswagner commented 8 years ago

For now it looks like building a shared library just for the tests is probably not a good idea. We need to build position independent code for a shared library which might affect performance (not tested though) or we need to build twice - once for the static and once for the shared library. That does not seem desirable.

viktordick commented 7 years ago

This problem not only affects the tests but also any code that uses the QUDA library and creates multiple executables. The performance should not suffer noticeably since changing from static to shared libraries only affects the calling of the QUDA routines and not the performance of the routines themselves (see https://gcc.gnu.org/ml/gcc/2004-06/msg01956.html).

Anyhow, in Bielefeld we used QUDA as a shared library, but only with the 5.0 version. I have now been trying to get 8.0 to work as a shared library but have been unable to do so. Since the use of autotools is discouraged, I tried to change the CMakeLists.txt file accordingly (which seemed simple enough, simply replacing STATIC by SHARED in {lib,tests}/CMakeLists.txt), but even though libquda.so is created just fine, any linking against this library (either for the tests or for other programs) yields undefined references to some functions (for example, covDev).

Can anyone help me to see what I missed? I would rather not use a static library because of the large executables (especially since every compilation result in our use case is cached, making the cache folder grow very fast).

mathiaswagner commented 7 years ago

@viktordick The issue just never had high enough priority. But aus you already started working on it do you mind creating a branch (we use feature/name_of_feature as convention, see: https://github.com/lattice/quda/wiki/QUDA-Development-model ) from current develop with the things you tried?

cpviolator commented 7 years ago

The CovDev error is probably a small bug in the code, it's resolved here: https://github.com/lattice/quda/issues/521

I've been playing a lot with CMake recently as we use QUDA with several dependences and we needed to edit the CMakeLists.txt file to accommodate for those dependencies. Can you attach an error log for your build?

viktordick commented 7 years ago

The errors I get are as follows:

[  1%] Linking CXX shared library libquda.so
[ 83%] Built target quda
[ 85%] Built target quda_test
[ 86%] Linking CXX executable deflation_test
../lib/libquda.so: undefined reference to `quda::covDev(quda::cudaColorSpinorField*, quda::cudaGaugeField&, quda::cudaColorSpinorField const*, int, int, quda::TimeProfile&)'
../lib/libquda.so: undefined reference to `quda::computeLongLinkCuda(void*, void*, void const*, void const*, double, QudaReconstructType_s, QudaPrecision_s, dim3, quda::llfat_kernel_param_s)'
../lib/libquda.so: undefined reference to `quda::computeGenStapleFieldParityKernel_ex(void*, void*, void const*, void const*, void*, void*, void const*, void const*, int, int, int, double, QudaReconstructType_s, QudaPrecision_s, quda::llfat_kernel_param_s)'
../lib/libquda.so: undefined reference to `quda::covdev::initConstants(quda::cudaGaugeField&, quda::TimeProfile&)'
../lib/libquda.so: undefined reference to `quda::siteComputeGenStapleParityKernel(void*, void*, void const*, void const*, void*, void*, int, int, double, QudaReconstructType_s, QudaPrecision_s, dim3, quda::llfat_kernel_param_s, CUstream_st**)'
../lib/libquda.so: undefined reference to `quda::siteComputeGenStapleParityKernel_ex(void*, void*, void const*, void const*, void*, void*, int, int, double, QudaReconstructType_s, QudaPrecision_s, quda::llfat_kernel_param_s)'
../lib/libquda.so: undefined reference to `quda::computeGenStapleFieldParityKernel(void*, void*, void const*, void const*, void*, void*, void const*, void const*, int, int, int, double, QudaReconstructType_s, QudaPrecision_s, dim3, quda::llfat_kernel_param_s, CUstream_st**)'
collect2: error: ld returned 1 exit status
make[2]: *** [tests/deflation_test] Error 1
make[1]: *** [tests/CMakeFiles/deflation_test.dir/all] Error 2
make: *** [all] Error 2

I will push the feature branch with the changes I tried once I figure out how to properly configure the remote repository for pushing.

mathiaswagner commented 7 years ago

Ok. Doing a little digging in the code and I guess I found the issue.

Depending on the configuration some parts of quda are not build to reduce compile time. In the files llfat_quda.cu and covDev.cu the corresponding ifdef's remove whole function bodies (instead of just making it an empty function).

These are the undefined references you saw. Guess this should be easy to fix by modifying the two files so that they generate empty bodies.

mathiaswagner commented 7 years ago

All the shared library discussion should now go to #524.