Description of current memory allocators

There are two memory allocators in Gramine: MEMMGR and SLAB.

Both allocators rely on the following shared logic:

system_malloc() and system_free() are macros that are defined inside LibOS and in PAL
- these macros/functions are "back-end" allocations of page-sized contiguous memory regions; i.e. these are actual host memory allocations
- these macros/functions are used to dynamically grow (enlarge) and shrink (not really used!) the slabs/caches in MEMMGR and SLAB
- defined in LibOS like this:
  - generally: https://github.com/gramineproject/gramine/blob/e740728548ef52615cffdb64f573a998abdfa61f/libos/src/libos_malloc.c#L34
  - special case of VMAs: https://github.com/gramineproject/gramine/blob/e740728548ef52615cffdb64f573a998abdfa61f/libos/src/bookkeep/libos_vma.c#L315
- defined in PAL like this: https://github.com/gramineproject/gramine/blob/e740728548ef52615cffdb64f573a998abdfa61f/pal/src/slab.c#L28
SYSTEM_LOCK() and SYSTEM_UNLOCK() are macros that are defined inside LibOS and in PAL
- protect the specific MEMMGR "memory managers", each MEMMGR manager object has its own lock; this allows more parallelism as each LibOS subsystem that uses MEMMGR introduces its own lock
- defined in LibOS like this:
  - for SLAB: https://github.com/gramineproject/gramine/blob/e740728548ef52615cffdb64f573a998abdfa61f/libos/src/libos_malloc.c#L20-L24
  - for MEMMGR, each LibOS subsystem has its own lock, e.g. for FS operations: https://github.com/gramineproject/gramine/blob/e740728548ef52615cffdb64f573a998abdfa61f/libos/src/fs/libos_dcache.c#L19-L23
- defined in PAL like this (only SLAB): https://github.com/gramineproject/gramine/blob/e740728548ef52615cffdb64f573a998abdfa61f/pal/src/slab.c#L15-L19

MEMMGR fixed-size allocator

Used to allocate specific objects in specific subsystems. Currently used only in LibOS.

Each subsystem of LibOS that uses MEMMGR specifies its own (global to the subsystem) lock. Thus, MEMMGR object allocs/frees in the same subsystem are synchronized on this lock, but object allocs/frees in different subsystems can run in parallel.

The current users of MEMMGR:

VMA subsystem (with some specific features, like disallowing automatic growth of MEMMGR underlying memory regions)
- see https://github.com/gramineproject/gramine/blob/master/libos/src/bookkeep/libos_vma.c
- managed objects: struct libos_vma
- ~~MEMMGR lock: struct libos_lock vma_mgr_lock~~ no MEMMGR-special lock, rather VMA does its own locking via struct libos_lock vma_mgr_lock
FS mounts
- see https://github.com/gramineproject/gramine/blob/master/libos/src/fs/libos_fs.c
- managed objects: struct libos_mount
- MEMMGR lock: struct libos_lock g_mount_mgr_lock
FS dentries
- see https://github.com/gramineproject/gramine/blob/master/libos/src/fs/libos_dcache.c
- managed objects: struct libos_dentry
- MEMMGR lock: libos_lock dcache_mgr_lock
LibOS handles
- see https://github.com/gramineproject/gramine/blob/master/libos/src/bookkeep/libos_handle.c
- managed objects: struct libos_handle
- MEMMGR lock: struct libos_lock handle_mgr_lock

Every managed object is wrapped into a MEM_OBJ_TYPE struct. This struct doesn't have additional fields (if not built with ASan), so it's the most compact representation possible. When the object is "freed" and its underlying memory is moved to the free list, MEM_OBJ_TYPE's list field is used instead.

Design and implementation are very simple:

There is a single "memory manager" object created for each LibOS subsystem.
This memory manager keeps two lists:
- available areas (disjoint allocated memory regions)
- free list of re-usable objects
Each subsystem chooses the initial size (number of objects) this manager allocates at startup. The first area is allocated which can host this initial number of objects. This first area is also marked as active (when the first area becomes full, the second area will be allocated and will be marked as active, and first area becomes inactive).
When the subsystem needs a new object (alloc), the manager first checks if its currently active area has any untouched memory left, and if yes, object is "allocated" in this memory. If no memory in active area, then the manager checks the free list, and if there is one free slot, object is "allocated" in this slot. If no free slots, then (if subsystem allowed it), the manager creates a new area, makes it active, and "allocates" the object at the base address of this new area.
When the subsystem frees the object, there are two possibilities:
- The to-be-freed object is inside the checkpoint memory region (i.e., the object was copied from the parent process). This corner case is not handled (checkpoint memory is never freed or reused), so the manager returns and the object is leaked.
- Otherwise, the object is added to the free list.

The MEMMGR memory managers are never "reset", or shrunk, or deleted. Thus, if LibOS allocated a lot of MEMMGR objects initially, and then freed them all, then this MEMMGR memory is leaked. This should be a very rare and unimportant case though.

Backend-memory (page-sized) allocation happens via __system_malloc() declared here: https://github.com/gramineproject/gramine/blob/master/libos/src/libos_malloc.c

Backend-memory (page-sized) deallocation, as mentioned above, doesn't really happen. But if it would, then it would be via __system_free() also declared here: https://github.com/gramineproject/gramine/blob/master/libos/src/libos_malloc.c

Support for ASan

Allocated objects have an 8-byte redzone (padding)
When object is freed, ASan poisons the "freed" memory

Open issues

Objects in migrated memory (in the child) leak because their "slots" are never re-used: https://github.com/gramineproject/gramine/blob/e740728548ef52615cffdb64f573a998abdfa61f/common/include/memmgr.h#L248-L251
- TODO: create a separate GitHub issue about this.
Names are bad, in particular get_mem_obj_from_mgr_enlarge(MEM_MGR mgr, size_t size) -- the size here is actually "by how many bytes to increase the pool of available memory if there is not free memory in areas and no free slots in the free list".
Something is fishy with size arguments in functions. These args are in bytes (at least that's what the callers assume), but function implementations seem to treat the argument as count. TODO: verify and fix this; we may have a memory leak here.

SLAB variable-size allocator

Generic backend for malloc and free in all other subsystems. Used both in LibOS and in PAL.

When any (random size) object needs to be allocated/freed in LibOS or in PAL, the traditional malloc() and free() calls are used. They are wrappers around slab_alloc() and slab_free(). See:

Backend-memory (page-sized) allocation and deallocation is implemented via:

for LibOS: __system_malloc() and __system_free() in https://github.com/gramineproject/gramine/blob/master/libos/src/libos_malloc.c
for PAL: system_mem_alloc() and system_mem_free() in https://github.com/gramineproject/gramine/blob/master/pal/src/slab.c

There is a single global slab manager and corresponding lock for LibOS and similarly a single global slab manager and corresponding lock for PAL. See these:

NOTE We have a struct libos_lock in LibOS. This lock is implemented via PalEventWait() and PalEventSet() which do have a fast path, but the slow path results in ocall_futex(), which is super-expensive in SGX. Maybe we could replace it with a spinlock()? Technically most of the time we'll hit the slab-allocator's cache, which is a fast operation; so it seems like no real need to ocall_futex() in this case.

Design and implementation is based on the MEMMGR allocator for the common case, and has a trivial fallback for the large-object case:

If the requested object's size is greater than the max allowed in the SLAB allocator's levels (currently ~2KB, but #1763 changes it to ~8KB), then SLAB falls back to backend-memory allocation: https://github.com/gramineproject/gramine/blob/f35d8e034c1d9bf7b6e2da9125b64a25b622b1d9/common/include/slabmgr.h#L308-L322
If the requested object's size is less than one of the SLAB allocator's levels, then SLAB chooses the highest level this object fits in (to pack objects as dense as possible) and then proceeds in the same way as MEMMGR.
- In other words, each SLAB level has the same rules as MEMMGR (search free space in active area, then fallback to free list, then fallback to enlarging the level by allocating a new area).

Deallocation happens similarly to the allocation description above:

Get the SLAB allocator's metadata for the to-be-freed object. Part of this metadata is the level value.
If level == -1, it means that the object's size is greater than the max allowed in SLAB, so it was allocated via backend-memory allocation, thus it must be deallocated via backend-memory free: https://github.com/gramineproject/gramine/blob/f35d8e034c1d9bf7b6e2da9125b64a25b622b1d9/common/include/slabmgr.h#L402-L412
Otherwise this object's memory is returned back to the SLAB allocator's level's free list: https://github.com/gramineproject/gramine/blob/f35d8e034c1d9bf7b6e2da9125b64a25b622b1d9/common/include/slabmgr.h#L440-L443

Open issues

No realloc() implementation.
- I'm not sure what's the technical reason for not having it.
- If we provide realloc(), then we have many places in LibOS and PAL that could benefit from this function (currently they implement such realloc in ad-hoc ways combining malloc memcpy and free).
SLAB_CANARY is defined for LibOS, but not for PAL.
- This leads to the SLAB object metadata (header) being 32B in LibOS,
- and only 16B in PAL.

gramineproject / gramine

[Metaissue] Memory allocators in Gramine, their performance and possible alternatives #1767

Description of the feature

Why Gramine should implement it?

Description of current memory allocators

MEMMGR fixed-size allocator

Support for ASan

Open issues

SLAB variable-size allocator

Open issues