uwsampa / grappa

Grappa: scaling irregular applications on commodity clusters
grappa.io
BSD 3-Clause "New" or "Revised" License
157 stars 51 forks source link

problem with shared memory allocation #148

Open mixcmc opened 10 years ago

mixcmc commented 10 years ago

Hi Grappa team!

I built grappa and tried to run hello_world demo but there were problems with shared memory allocation:

E0227 19:52:38.147748 11703 LocaleSharedMemory.cpp:188] Allocation of 532480 bytes with alignment 4096 failed with 776080 free, 104857600 total. * Aborted at 1393516358 (unix time) try "date -d @1393516358" if you are using GNU date * PC: @ 0x7fe42592746d google::DumpStackTrace() @ 0x452dd0 Grappa::impl::failure_function() @ 0x45751c Grappa::impl::LocaleSharedMemory::allocate_aligned() @ 0x47ffe9 Grappa::impl::coro_spawn() @ 0x4801d1 Grappa::impl::worker_spawn() @ 0x485dd2 Grappa::impl::TaskingScheduler::createWorkers() @ 0x444a2a main @ 0x3dea221b45 (unknown) @ 0x4494d1 (unknown)

(I added printing of FLAGS_locale_shared_size in LocaleSharedMemory::allocate_aligned in case of error to check shmmax value passed to configure script)

I tried different values for sysctl kernel.shmmax and kernel.shmall and –shmmax option of configure script but it still crashes. Only thing I was able to do it crash demo in earlier moment:

E0227 19:58:40.296942 11841 LocaleSharedMemory.cpp:188] Allocation of 16777216 bytes with alignment 8 failed with 102176 free, 1024000 total. * Aborted at 1393516720 (unix time) try "date -d @1393516720" if you are using GNU date * PC: @ 0x7f7a4e8f846d google::DumpStackTrace() @ 0x452dd0 Grappa::impl::failure_function() @ 0x45751c Grappa::impl::LocaleSharedMemory::allocate_aligned() @ 0x480d33 Grappa::impl::TaskManager::activate() @ 0x453097 Grappa_activate() @ 0x4448d5 main @ 0x3dea221b45 (unknown) @ 0x4494d1 (unknown)

BTW, 16777216 bytes is half of kernel.shmmax at the moment of configuring and building of grappa. Why do grappa get shmmax as a parameter to configure and tries to allocate half of kernel.shmmax?

I ran demo with mpirun (there is no slurm on that machine): mpirun -n 1 –ppn 1 applications/demos/hello_world.exe 1

Do you have any suggestions on getting rid of shared memory allocation failure?

Thanks, Mikhail.

bholt commented 10 years ago

Hi Mikhail. Congrats on getting it to build. :)

The SysV shared memory bit is an artifact of our current, pretty terrible, way of communicating between cores on the same machine. We currently run a separate process per core, and any memory that has to be reachable by other cores must be allocated out of this SysV shared memory (in Grappa we call it "locale shared memory"). And we unfortunately have to know exactly how much memory is available ahead of time. This messiness will go away with a new aggregator design we're working on based on MPI communication rather than GASNet, but that shouldn't be expected for another couple months.

There are a couple ways you may be able to fix this issue. First: if you have a bunch more physical memory available, you could reconfigure how much memory your OS sets aside for SysV shared memory. Our cluster is configured to allocate about half to SysV shared memory because of this. @nelsonje can maybe explain more how to configure this.

Another shorter-term workaround is to convince Grappa to allocate less and try to fit into that amount of memory. It looks to me like your crash right now is when Grappa is allocating stacks for "worker" threads (out of "locale shared memory"). The default is to allocate 512 workers per core (here), each with 219 bytes (here). You could try telling it to allocate fewer workers, each with smaller stacks (though I wouldn't go below 216). If it does make it to the point of allocating all the space it needs, you can specify verbose-logging level 1 (--v=1) to have it print out a listing of how much space is being used in each component of Grappa (RDMA is currently not being computed, not sure how much space it takes, global heap and shared_message_pool should self-adjust based on how much SysV shared memory is available):

$ bin/grappa_srun -n2 -- applications/demo/hello_world.exe --num_starting_workers=8 --stack_size=$((1<<17)) --v=1
# (some output elided)
0: -------------------------
0: Shared memory breakdown:
0:   global heap: 6442450944 (6 GB)
0:   stacks: 1048576 (0.000976562 GB)
0:   rdma_aggregator: ??
0:   shared_message_pool: 3210739712 (2.99023 GB)
0:   free:  6421715040 (5.98069 GB)
0: -------------------------

Though, of course, this will perform terribly on real programs because it won't have nearly enough workers to tolerate any kind of latency. But it may get you up and running.

nelsonje commented 10 years ago

@bholt is right: Grappa currently uses a separate process per core, and keeps data in a shared memory segment all the processes attach to. This segment needs to be as large as possible If kernel.shmmax is really 33554432, that's going to be too small. That leaves us only 32MB of shared data across all the cores on the machine.

How much memory does your machine have, Mikhail? Ideally, kernel.shmmax would be set to at least half of that. On our 24GB machines, we have kernel.shmmax = 12884901888.