project-asgard / asgard

MIT License
27 stars 20 forks source link

MPI tests using > 15 GB memory #626

Open quantumsteve opened 1 year ago

quantumsteve commented 1 year ago

Describe the bug A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

  1. git commit hash being built
  2. cmake command
  3. full program/test invocation command
  4. additional steps

asgard-unit-mpi-gxx and asgard-unit-mpi-gxx-scalapack currently run out of memory on the CI machines. Iincreased the memory size from 15GB to 144GB as a (temporarily?) workaround, suspect there is something over allocating memory or a test that should be scaled down or moved to a different label that runs less often.

Expected behavior A clear and concise description of what you expected to happen.

MPI tests pass on a container with 15GB RAM.

System:

Additional context Add any other context about the problem here.

Reproduced locally with docker run -m 15000m -it cpu /bin/bash

quantumsteve commented 1 year ago

Found while troubleshooting #624.

quantumsteve commented 1 year ago

failing tests are both continuity_6 https://github.com/project-asgard/asgard/blob/028b1426ed9d5a876bf79cdd3dfabb7b41f69f70/src/distribution_tests.cpp#L647 https://github.com/project-asgard/asgard/blob/028b1426ed9d5a876bf79cdd3dfabb7b41f69f70/src/time_advance_tests.cpp#L678

mkstoyanov commented 11 months ago

Did a few tests, I think that 32GB will be enough, but it is still too much for CI.

For some reason, when we create a 6D problem, we are unreasonable amount of memory for something. I wonder if this is an issue with the hash-map taking too much space. If I'm right, this will be an issue across the board, it's just that the problem needs to run 4 copies (since the test uses 4 mpi ranks) and all of our workstations have lots more than that.