Merge multiple mallocs into one large malloc

jimbxb commented 2 years ago

Another very simple optimisation that we could implement is to pull multiple heap allocations together, allocating one large structure, and using segments of the allocated memory for each of the separate allocations.

I'm not sure of the repercussions of this on the GC, as it would change the allocation pattern from lots of small allocations to less larger allocations.

This could also be extended to allocating all memory for a proc at the start of the call, pulling all allocations over every branch, using different offsets for each branch if necessary. If branches use drastically different amounts of memory, this would likely be less desirable, so some heuristic could be involved.

Paul Bone did a write up that may be useful for considerations here https://paul.bone.id.au/blog/2016/10/08/memory-fragmentation-in-boehmgc/

Originally posted by @jimbxb in https://github.com/pschachte/wybe/issues/352#issuecomment-1258777275

pschachte commented 2 years ago

We'll want to do some benchmarking of this to explore different heuristics for which allocations to merge. One interesting heuristic would be to merge allocations where the address of one allocation is stored in memory allocated by another, on the theory that when the outer one needs to be collected, the inner one will, too (though not necessarily vice versa).

jimbxb commented 2 years ago

Linked structures are likely a good candidate also, as having the structures allocated close by in memory could be good for cache performance.

PaulBone commented 2 years ago

In the general case merged allocations don't work with GCs because many GCs don't support interior pointers. That is if the GC knows it has allocated the address 0xABC0 but you have a pointer to 0xABC8, the GC might not recognise this as a pointer and might free your memory early. So you want to check if your GC supports interior pointers and what the limits are. But your specific case where A always points to B is safe if you know that A always has a longer-or-equal lifetime than B.

One optimisation some GCs support, usually those with bump-allocation for the nursery (or semispace). Is that a test-for-space can be separated from the actual allocation. If an allocation looks like this:


if (bump_pointer + size_to_allocate >= limit) {
   do_gc();
}
new_object = bump_pointer;
bump_pointer += size_to_allocate;

And this code would be inlined on every allocation site, then when there are multiple allocations you can instead put:

if (bump_pointer + total_size >= limit) {
   do_gc();
}
new_object_1 = bump_pointer;
bump_pointer += size_1;
new_object_2 = bump_pointer;
bump_pointer += size_2;
new_object_3 = bump_pointer;
bump_pointer += size_3;

There must be no calls to other functions that could allocate between the test and last allocation.

Each of these allocations is still a separate heap object so you don't save memory or other execution costs, only the test is amortised. But on the other hand the object's lifetimes can be completely unrelated.

PaulBone commented 2 years ago

I checked with @jimbxb, Since Wybe uses BoehmGC the "if they have related lifetimes and one always points to the other" optimisation is available, so is some level of interior pointer support, but if you want full support it may be an extra compile time option and cause the GC to retain more memory due to having more false-positive pointers.

pschachte / wybe

Merge multiple mallocs into one large malloc #357