pmodels / armci-mpi

An implementation of ARMCI using MPI one-sided communication (RMA)
https://wiki.mpich.org/armci-mpi/index.php/Main_Page
Other
13 stars 7 forks source link

use slab allocation #16

Open jeffhammond opened 6 years ago

jeffhammond commented 6 years ago

Instead of allocating a window for every call to ARMCI_Malloc, allocate a single slab according to GA_Initialize_ltd and suballocate from there. This will obviate the need for any GMR lookups, which helps latency.

A more general approach would be to use a slab of some capacity and then allocate a window per call for allocation in excess of that. Thus, the GMR lookup would start with a simple range check (fast) and only hit the O(n) linked-list traversal when the slab capacity was exceeded.

The key to implementing this is ARMCI_Set_shm_limit, which tells ARMCI what the upper bound on how much storage to allocate:

Global Arrays global/src/base.c line 374:

        if(GA_memory_limited) ARMCI_Set_shm_limit(GA_total_memory);
        if (_ga_initialize_c) {
            if (_ga_initialize_args) {
                ARMCI_Init_args(_ga_argc, _ga_argv);
            }
            else {
                ARMCI_Init();
            }
        }

NWChem always uses ga_initialize_ltd and requires the user to specify the GA allocation size, so no changes to NWChem are required to activate this.

Migrated from https://github.com/jeffhammond/armci-mpi/issues/21