icl-utk-edu / heffte

BSD 3-Clause "New" or "Revised" License
20 stars 15 forks source link

run all tests sequentially #13

Closed mkstoyanov closed 1 year ago

mkstoyanov commented 1 year ago
mkstoyanov commented 1 year ago

@G-Ragghianti let's see if this fixes the GPU memory problem.

If this fixes it, I don't want to merge this. Instead I want to fix the CTEST_PARALLEL_LEVEL variable in the container.

G-Ragghianti commented 1 year ago

It looks like it passed. Can we do it by calling cmake with "-DCTEST_PARALLEL_LEVEL=1"? I can add this to the CI script, but I think it would maybe make sense to implement it in the CMakeLists.txt just like you did. Otherwise, other people will have the same problem and will have to discover on their own to set this. It doesn't seem like anyone would have a problem with the ctest being serial by default. Thoughts?

mkstoyanov commented 1 year ago

CTEST_PARALLEL_LEVEL is an environmental variable, i.e., it has to be set with export CTEST_PARALLEL_LEVEL=1 or unset CTEST_PARALLEL_LEVEL in bash, CMake will have no effect.

All tests default to parallel and even in the case of heFFTe, it would probably be OK if we run together tests with 2 and 4 ranks, the problem will happen when we get 12 ranks together. The issue is also restricted to the GPU usage, the CPU tests don't have that problem.

I really want to know whether the variable is being set by you or by spack, the latter is a big issue and I would have to complain somewhere.

I can add a variable Heffte_SEQUENTIAL_TESTING and have the workaround that way.

G-Ragghianti commented 1 year ago

Nothing of mine is setting CTEST_PARALLEL_LEVEL and there is only two references to it in spack which don't appear to have an effect of setting the value (see spack/lib/spack/spack/build_systems/cmake.py). Since spack is running "make test" to run the tests (instead of ctest directly), maybe it is just caused by the inherent parallelism of make?

mkstoyanov commented 1 year ago

I have tested make test -j and that doesn't result in parallel tests. You have to do either make test ARGS=-j4 or ctest -j4 or set the environment variable. Spack is purposely forcing the variable instead of respecting the general system setup.

This is bad behavior, but not the first for spack

mkstoyanov commented 1 year ago

So I added the option Heffte_SEQUENTIAL_TESTING but I want to have it off by default, so the test will fail now.

We should simply add -DHeffte_SEQUENTIAL_TESTING=ON to spack.

How do you want to add this to the spack package: