Get rid of the fancy self-alignment and just do a linear backwards walk to find the chunk for a handle. Theoretically slower, but in practice most of the handles that need to find their chunk will be in the last chunk.
Go one step further and get rid of the handle area entirely. After we flattened the bytecode, we need much fewer tracked heap pointers at once. Maybe we can keep a bounded set of heap tracking slots and keep that for the whole runtime?
As of 5652664, we're leaking up to 1MB (quite a significant amount) for each handle area creation. There are various ways we can solve this: