General purpose memory / allocator

Has a info header that stores size of the allocated object and a pointer to the ~~previous and the~~ next allocated object (or allocator memory start ~~or end if there is no other object~~)
Try to use a ~~double~~ linked list approach in free memory to keep track of the gaps. Have a look at the implementation of the memory pool. This might change the structure of the header needed for allocated objects (see bullet point above).
This should be the default allocator (probably slowest)

Memory pool / pool allocator

Each element has a fixed size - memory positions are direct access compatible
Number of elements to store are passed instead of byte size
Keeps track of gaps and fills them ~~first~~
~~Only numbers of first elements in a gap are stored and the number of following empty spots~~
~~Merge gaps regularly if possible. For example: every 100 allocs sort the gap list and merge gaps that are connected.~~
The gaps are stored by using a linked list in the free memory as described in the book game engine architecture. Therefore pointers to the first and the last free element are needed. Each element points to the next free element. The last element holds the value of the nullptr to mark the end. Because allocated values are taken from the front and deallocated are put in the back, memory usage follows a FIFO ordering.
Use it for only some common small sizes e.g 32, 64, 128, 256bytes

Memory stack / stack allocator

Only stores a pointer after the last element
Counts number of allocations and deallocations
Resets the pointer to the start if allocs == deallcs
memoryStackDeallocationMark class stores the current number of allocations and the current pointer in memory during construction. When the destructor is called it checks if the current number of allocations is equal to the stored number. If true, it sets the current memory pointer back to the stored position. If not, an assertion or a write to error output might be the solution. Exceptions from a dtor are bad practice and will trigger compiler warnings. Solution would be to provide two functions that set back the stack that can also be called by the user. A throwing version and a non throwing one. The non throwing one is called in the dtor. Only the first call to one of the functions will have any effect. This class makes only sense to be used with thread private stacks, since concurrency can mess up anything.
fastest, but inflexible

Memory Manager Structure

Without any modification the memory manager always uses a general purpose allocator with default size.
Addable Stack and Pool allocators before first allocation
First allocation or direct command to the memory manager will call new to get one big chunk of memory that suffices the need of all allocators. Then the allocators are initialized and get their corresponding memory block from the memory manager
Alternative: Memory Manager must be initialized with a public function. If it is not initialized it should use the standard allocator. This way there would be no need for template versions of classes that do dynamic memory allocation internally or for functions that use container classes that can take an allocator in their interface. All memory allocation is then done via the memory manager which becomes more or less the std::allocator (except for the small detour through the memory manager) if not initialized.
Each allocator calls a specific set of MemoryManager functions to get a pointer to the needed Functions. For example GetStackAllocationFunction and GetStackDeallocationFunction. They return another MemoryManager function which is just a pass through to the selected memory model (MemoryStack, MemoryPool, GeneralPurposeMemory). This pointer is stored as a static member in the allocator (or maybe in the getter). This way one needs to check only once for each allocator which function set should be used. For example a stack allocator can use the std::allocator or the general purpose memory as a backup if no memory stack was initialized.
Alternative: The distinction between the custom memory model and the the std::allocator can also be done in the allocator itself. The allocator asks the memory manager if it is initialized or not. The IsInitialized routine itself is protected by a mutex and takes a bool (default false) which causes the memory manager to become uninitializeable if true. This way the status of the memory manager during the first allocation decides if the std::allocator or a custom memory model is used. The whole concept would make it possible to use a base class for all the custom memory models and store the pointer to the memory models allocation function instead of a pointer to a MemoryManager pass through function. This avoids additional code and another unnecessary function call per allocation. One still needs those getters for the functions since a pointer or reference to the memory model itself would expose the initialize and deinitialize functions.

Remarks

Typedefs of all container classes which are changed by defines might not do the trick. If you build the lib without the definition but use it in some application, the linked libraries still use new as allocator.
All GDL classes might need an own allocator for their storages (not sure about that)
Current best option: Make all classes templates with an allocator as template argument. Specialize the class in the cpp for each allocator so that the implementations are there for each allocator chosen by the applications user. The application can than typedef the chosen template to avoid explicitly typing the template argument when using the class.
UPDATE Have a look at the Alternative in the memory manager section

Class internals

For dynamic temporary objects, there should be a thread private stack allocator that takes care of the memory allocation.

Testing

Threadsafety: Create multiple containers of any type on multiple threads and fill them with 1..N via single inserts. Then iterate over the container and sum up the single elements. Compare the result with the gaussian sum formula (N^2+N)/2. If they don't match, there probably is a concurrency problem.

Weblinks

ToDo

[x] Update/Write memory stack
[x] Make memory stack thread safe
[x] Update/Write stack allocator
[x] Use explicit initialization and deinitialization for memory stack
[x] Remove assertion that memory stack is deinitialized in dtor - The assertion will always trigger if an uncaught exception is thrown (not enough memory etc.) and therefore hide the true origin of the program error.
[x] Add aligned allocation to memory stack
[x] Update/Write thread private memory stack
[x] Make thread private memory stack thread safe (throw on thread id mismatch)
[x] Use explicit initialization and deinitialization for thread private memory stack
[x] Remove assertion that thread private memory stack is deinitialized in dtor - The assertion will always trigger if an uncaught exception is thrown (not enough memory etc.) and therefore hide the true origin of the program error.
[x] Add aligned allocation to thread private memory stack
[x] Update/Write thread private stack allocator
[x] Add memoryStackDeallocationMark class - Class can only be generated by memory stack and is returned via function: DeallocationMark CreateDeallocationMark()
[x] Use memoryStackDeallocationMark class with thread private memory stack
[x] Write memory pool
[x] Write pool allocator
[x] Move the code which uses the general purpose memory as backup from the memory manager to the pool allocator. Provide an extra private function for the initialization of the static variable and let the memory manager return a nullptr if no fitting memory pool was found.
[x] Make memory pool thread safe
[x] Use explicit initialization and deinitialization for memory pool
[x] Add alignment to memory pool allocation function - it just checks if the alignment is compatible with the allocators global alignemtn
[x] Remove assertion that memory pool is deinitialized in dtor - The assertion will always trigger if an uncaught exception is thrown (not enough memory etc.) and therefore hide the true origin of the program error.
[x] Add alignment option during construction to memory pool
[x] Check if element size of pool allocator is a multiple of the the desired alignment. Everything else would be a waste of memory
[x] Write general purpose memory
[x] Make general purpose memory thread safe
[x] Use explicit initialization and deinitialization for general purpose memory
[x] Add aligned allocation to general purpose memory
[x] Write general purpose allocator
[x] ~~Write dynamic allocator, which is allowed to select the source of memory (basically pool or general purpose depending on size)~~ This would result in the same allocator as the pool allocator, since it backups to the general purpose memory if no fitting memory pool is found.
[x] ~~Use std::align_val_t as type for alignment values in every implemented memory system~~ not ideal for this lib and the implemented memory systems
[x] Update memory manager
[x] Make memory manager thread safe
[x] Use explicit initialization and deinitialization of memory manager
[x] ~~Assertion that memory manager is deinitialized in dtor~~ assertion hides thrown exceptions
[x] Uninitialized memory manager uses std::allocator - forbid mixture - if the std::allocator was used it will always be used. Otherwise the deallocation will get problematic.
[x] Create a memory model (HeapMemory) which just uses new and delete and serves as the fallback option in all allocators. This way the branch can be removed. Check this for some inspiration -> Allocation/deallocation methods are just moved to the memory model as far as possible.
[x] Add GDLversions of STL containers (typedef with alternative allocator). Don't use them as "Golden hammer" Only data that varies a lot and is time critical should use them. Otherwise it gets overly complicated for to less gain. Datatypes that only exist during initialization and are small don't need to be put into custom allocated memory.
[x] Add ifdef FORCE_STD_ALLOCATOR -> This should replace all custom allocator types with a simple typedef using std::allocator. (Maybe make it an option in CMake - better start a list of definitions which affect the library).
[x] Add Travis Build setup with FORCE_STD_ALLOCATOR enabled
[x] Create function to check the number of allocations and deallocations in test-files, which takes a ref to an instance of the GlobalNewAllocationCounter and the expected number of allocations and deallocations. If FORCE_STD_ALLOCATOR is defined, it simply does not perform the BOOST_CHECK calls
[x] Add typedefs for strings etc.
[x] Introduce literals for Kb, Mb, Gb etc.
[x] move is_power_of_2 function from memoryPool.cpp to an own .h/.cpp or a fitting collection of functions
[x] write custom cast functions that cast numerical values to strings using snprintf. This way custom allocators can be used.
[x] move the writeAddressToMemory etc. functions in generalPurposeMemory and the equivalent into a separate header
[x] read this and check if there is something interesting
[x] Move static variable of deallocation and allocation functions of allocator to a member function so that only one static variable is needed.
[x] Replace mutex of memory manager with shared mutex or use second shared mutex only for thread private memory
[x] ~~Think of using bad alloc exception instead of the GDL::Exception. In this case the exception classshould also use the GDL string. Maybe add a macro STD_EXCEPTION~~ Exception is derived from std::runtime_error. Can't change the string type
[x] Think of using Spinlocks in the memory systems, because they are more light weight than a mutex and the access times are rather short. Implementation examples: https://stackoverflow.com/questions/26583433/c11-implementation-of-spinlock-using-atomic

Tests

[x] Use #ifdef to create multiple tests with different memory setups in the memory manager. This way there is no need for multiple test files. Just create multiple tests with the same file and different definitions
[x] allocator test "template" so that it is not necessary to write the same test for every allocator - probably use a typedef for the allocator that should be used
[x] memory stack general
[x] memory stack thread safety
[x] stack allocator
[x] thread private memory stack general
[x] thread private memory stack thread safety
[x] thread private stack allocator
[x] memory pool general
[x] memory pool thread safety
[x] pool allocator
[x] general purpose memory general
[x] general purpose memory thread safety
[x] general purpose allocator
[x] memory manager general
[x] uninitialized memory manager (std::allocator)
[x] memory manager thread safety
[x] ~~dynamic allocator~~ not needed, just use the pool allocator
[x] check if there are no hidden new calls with the global new counter in ever test
[x] test custom cast functions
[x] test if alignment works correct for all allocators
[x] Add thread safety test for the creation and destruction of thread private stacks in the memory manager test

Benchmark Check if using google benchmark for this test does make sense.

[x] ~~Measure std::allocator reference where new is able to use same memory blocks over and over again -> construction, allocation, destruction, repeat - this is the fastest possible new allocation~~ This can be done by setting the number of test runs in the current benchmark to 1 or higher
[x] Measure std::allocator reference where new always needs to find a fresh memory block -> construct, allocate, repeat -> destroy everything at the end of the test. This can be done by using an array of the container class that should be tested.
[x] Make a template version of each benchmark to test the behavior depending on the allocation size
[x] Benchmark stack allocator the same 2 ways as the std::allocator
[x] Benchmark pool allocator the same 2 ways as the std::allocator
[x] Benchmark general purpose allocator the same 2 ways as the std::allocator

General Purpose Memory

Implementation details

The general purpose memory is capable to deliver memory of desired size and alignment. Additionally allocations and deallocations are not restricted to any specific order like in some stack approaches.

There are some performance and memory costs involved to obtain this freedom:

Memory costs

The size of occupied memory blocks is stored in memory in front of the returned address during allocation. This information is needed during deallocation. Additionally the used alignment is stored in the preceeding byte of the returned memory pointer. Therefore the real size of each allocation is increased by the size of a size_t (occupied space information) and the chosen alignment value.

The size of a free memory block is stored at the beginning of each free memory block (in free memory). Additionally the address of the next free memory block is stored. This enables one to traversing all free memory blocks as fast as possible and find a fitting free one during allocation. This requires from each free memory block to be large enough to store a size_t and a `void*. If there is some memory left after an allocation and the size is not large enough, the memory is added to the allocation and therefore lost for future allocations until the allocated block is freed.

The free choice of allocation size and the unrestricted order of allocations and deallocations lead to fragmentation problems. Therefore the total memory should be larger than the theoretical maximum memory requirement of the program. Otherwise an allocation might fail, even though the total amount of memory would be enough.

Performance costs

During allocation and deallocation the linked list of free memory blocks needs to be updated if necessary. This adds some extra costs when compared to simpler memory models like memory pools or stacks. The involved write operations are more or less constant time, but during allocation and deallocation the linked list needs to be traversed. In case of the allocation it is traversed to find a free block of proper size. During deallocation the previous and following free memory block need to be found in order to update the linked list and eventually merge the newly freed memory with them. The costs of this list traversal highly depend on the memory fragmentation.

Allocation function

Necessary steps:

Find free memory
Adjust free memory linked list
Align returned pointer
Write allocation size and alignment bytes in front of returned pointer
Return pointer

Adjust free memory linked list

There are the following branches that need to be distinguished:

Is found memory block first of free memory list? (FF)
Is found memory block last of free memory list? (LF)
Is enough memory left in found memory block to write another free memory header? (ML)

This makes 8 different cases in total. The consequences for each case are shown below. Prior to commit 082be6adf7b5eb60abd5a69bf8f5e249f9c38924 the implementation was exactly as in the list below. Afterwards it was simplified by reducing code duplications.

Results for all branch combinations:

FF = false && LF = false && MF = false

Prev memory block points to next free memory block

FF = false && LF = true && MF = false

Prev memory block points to nullptr
Last free element pointer = previous free memory pointer

FF = false && LF = false && MF = true

Write new free memory block
New free memory block points to next free memory block
Prev memory block points to new free memory block

FF = false && LF = true && MF = true

Write new free memory block
New free memory block points to nullptr
Prev memory block points to new free memory block
Last free element pointer = new free memory block

FF = true && LF = false && MF = false

First free element pointer = next free memory block

FF= true && LF= true && MF = false

First free element pointer = nullptr
Last free element pointer = nullptr

FF= true && LF = false && MF = true

Write new free memory block
New free memory block points to next free memory block
First free element pointer = new free memory block

FF = true && LF = true && MF= true

Write new free memory block
New free memory block points to nullptr
First free element pointer = new free memory block
Last free element pointer = new free memory block

Deallocation function

Necessary steps:

Find enclosing free memory blocks
Adjust free memory linked list
Merge with previous and/or following memory block if possible
Write free memory block information if necessary

List adjustment, merging and free memory info

These three subtasks of the section title have some common branches and are therefore joined. There are the following branches that need to be distinguished:

new free block is new first free memory block (FF)
new free block is new last free memory block (LF)
new free block is mergeable with previous free memory block (PM)
new free block is mergeable with next free memory block (NM)

This makes 16 different cases in total. Not all of them need to be treated since the firs free element can't have a previous free block etc. Excluding all unnecessary branch combinations leaves 9 different cases that need to be treated. The cases and their consequences are shown below. Prior to commit 151ebf2c7f82c5182f693272e0ce970f08424229 the implementation was exactly as in the list below. Afterwards it was simplified by reducing code duplications.

Results for all possible branch combinations:

FF = true && LF = true && PM = --- && NM = ---

Write free memory info
First free element pointer = new free memory block
Last free element pointer = new free memory block

FF = true && LF = false && PM = --- && NM = true

Read data from current next free memory block
Update next free memory block pointer to point to read position
Add read free memory size to freed blocks size
Write free memory info with updated data
First free element pointer = new free memory block
if merged block was last free -> Last free element pointer = new free memory block

FF = true && LF = false && PM = --- && NM = false

Write free memory info
New free memory block points to next free memory block
First free element pointer = new free memory block

FF = false && LF = true && PM = true && NM = ---

Add current blocks size to previous blocks size

FF = false && LF = true && PM = false && NM = ---

Write free memory info
New free memory block points to nullptr
Prev memory block points to new free memory block
Last free element pointer = new free memory block

FF = false && LF = false && PM = true && NM = true

Read data from current next free memory block
Update next free memory block pointer to point to read position
Prev memory block points to updated next free memory block
Update prev memory blocks size by the size of the two merged blocks.
if one merged block was last free -> Last free element pointer = new free memory block

FF = false && LF = false && PM = false && NM = true

Read data from current next free memory block
Update next free memory block pointer to point to read position
Write free memory info
New free memory block points to updated next free memory block
Add size of merged block to current freed size
Prev memory block points to new free memory block
if merged block was last free -> Last free element pointer = new free memory block

FF = false && LF = false && PM = true && NM = false

Add current blocks size to previous blocks size

FF = false && LF = false && PM = false && NM = false

Write free memory info
New free memory block points to next free memory block
Prev memory block points to new free memory block

vhirtham / GDL

Memory management and custom allocators #4