Object Finalizers - Githubissues

ysbaddaden commented 6 years ago

Crystal classes can have a finalize method that must be called when a class instance object is no longer referenced and will be collected.

Immix doesn't care about collecting objects. We merely forget about objects by reallocating in unmarked lines, overwriting any previous object allocation. In order to finalize methods we must thus iterate all objects in the HEAP, verify if it has registered a finalizer, then call the finalizer.

A difficulty is that a finalizer may depend on other objects. For example finalizing object C then finalizing object B may crash the program or behave wrongly if B references and relies on C in its finalize method. We thus need some sort of tree graph to determine in which order to call finalizer methods: you may only finalize an object if no other object with a finalizer referenced it, or if they have all been finalized already (i.e. no other object depends on the object to finalize).

One last issue is that a finalizer may try to allocate memory, for example build & print a string for debug purposes... Since finalizers shall be called before recycling blocks, this would lead to always grow the HEAP (empty recycled list, empty free list) or merely hang forever. I believe the correct solution would be to prevent allocations in finalizer methods, since they should only be used to free ressources (external pointers, closing file descriptors, ...).

ysbaddaden commented 6 years ago

In Crystal stdlib finalizers are mostly used to reclaim unmanaged memory for C bindings, which means we don't need access to external references. Thought the efficacy is debatable, see the wikipedia article on finalizers, this is good:

they don't depend on external references to managed memory;
they don't block the loop for too long (they merely call a C function to free an allocation);
they don't depend on the crystal environment (no IO, no exceptions);
it will eventually free some unmanaged memory, avoiding leaks.

There is one complex occurrence thought: IO#finalize. It delegates to the abstract #close method, thus depends on uncontrolled implementations. For example IO::Buffered#close will flush any pending data to write, then delegate to yet another abstract #unbuffered_close method. For example IO::FileDescriptor#unbuffered_close will try to close(2) the fd, and possibly raise an Errno exception if the external call failed.

An apparently harmless finalizer (close unless closed) turns out to be bad and leads to:

run arbitrary user code (possibly accessing references to managed memory);
depend on a libevent loop to be accessible;
allocate memory (e.g. the Exception object);
raise an exception.

I'm not sure how to handle this.

The optimizer in me screams:

screw finalizers, iterate unreachable objects before sweeping and call finalizers on objects as they are found; document that order of execution is undefined; fix IO#finalize to call #finalize_impl; which may in turn call #system_finalize that will merely close(2) the fd. Use compiler visitors to forbid raising exceptions within a finalizer context, to forbid accessing references to external managed memory (i.e. only allow accesses to primitives, structs and unmanaged libc pointers), maybe even forbid to call GC.malloc. Yeah, that's a lot to forbid.

The security aware dude in me whispers:

mark unreachable objects with finalizers as if they were reachable (recursively), build a reference tree of those objects (beware: circular references) or follow bdw's topologically aware finalization; then delegate to the crystal world (e.g. after GC_collect is called) the actual execution of finalizers, so they're run after collection, in an somewhat expected execution context, that may catch exceptions and log them; where GC_malloc is available, etc. Eventually nullify the finalizer callback and let the next collection reclaim memory.

But still fix IO#finalize to simpler code, maybe even restrict the finalizer to IO::FileDescriptor and Socket, since the finalizer was initially implemented to avoid leaking file descriptors.

ysbaddaden commented 6 years ago

If we mark unrechable objects with finalizers as if they were reachable, then all their references will be marked and won't be collected this time. We can then merely mark the objects for finalization, and let GC.collect (Crystal) execute the finalizers after GC_collect (C) is done, trapping and logging exceptions raised in finalizers (shouting a warning message to STDERR), and eventually unregister the finalizer before resuming the program. Such objects will then be collected on the next collection.

We'd have the following guarantees:

all references are alive;
finalizers are run once;
the execution context is known (crystal thread, event loop is accessible, can allocate memory);

We'd still miss one guarantee:

finalizers will be executed in whatever order.

I don't believe the order of execution to be a real problem. Since finalizers are executed once, a simple dispose of an external allocation will only be freed once (no double free segfaults), as long as objects don't directly call #finalize directly, in which case finalizers must be aware to double execution.

Objects with more complex finalizers must be resilient to double execution of cleanup operations, which would be the same without the order of execution. For example A references B and both have finalizers to call B#close, then B#close must be resilient to multiple calls, or finalizers must check for B#closed? before trying to call B#close.

That sounds acceptable to me, but maybe I'm wrong, and there are some aspects I didn't think about? I'd love to hear about them!

asterite commented 6 years ago

I think we can do several things:

let finalizers in the std be simple (no flushing pending buffers, you must be sure to do that yourself)
remove the finalize method: it's just to easy to write and do whatever logic you want there, but:
add a way to register a finalizer to an object, similar to how it's done in Ruby and Go. This is a bit more obscure and hard to use, but it can be documented there, in that method, that the method shouldn't allocate, shouldn't raise and be very simple, otherwise it's undefined behavior

asterite commented 6 years ago

(the last point already exists, but is undocumented, and it's how finalize works)

RX14 commented 6 years ago

If we make rules that are easy to break (i.e. don't allocate, don't use IO), then we should have a way to enforce them.

ysbaddaden commented 6 years ago

I'm trying to weight optimizations against language safety.

A blacklist in finalize is possible, but there is always the risk that we miss something, or that something changes and it should have been forbidden but forgot to. That's reminding me of the whitelists of signal and thread unsafe C functions :disappointed:

The top solution (speed+safety) would be to not implement finalizers... but that's barely acceptable. It would require to always manually dispose of wrapped C allocations (good for reclaiming, bad for developer happiness).

I believe the solution I propose above is acceptable in both simpleness and usages. It would delay the memory reclamation (not perfect), but allow to achieve whatever we want (such as returning an unreachable object to a pool); it could be run concurrently after a GC sweep; and more importantly, be safe (no segfault in production because I inadvertently malloc'ed in a finalizer).

We could improve the stdlib in parallel. For example restrict them to strict cleanup; but also to introduce dispose methods & still have an eventual finalizer to avoid leaks; maybe even a way to unregister a finalizer. For example, HTTP::Server controls the lifetime of an OpenSSL::SSL::Socket, and could dispose early of the C allocation, just like File.open(&block) closes the fd.

Last but not least, we should document that finalize isn't a destructor and:

may never be called, and will most probably be called late;
may hinder GC performance, so they should be short;
should be about avoiding leaks (e.g. memory, file descriptors);
shouldn't free/close dependent references, they should have their own finalizers.

asterite commented 6 years ago

@ysbaddaden Oh, sorry, I didn't read that you had a solution for this in https://github.com/ysbaddaden/gc/issues/2#issuecomment-358277874 , I think that's better than my lazy proposal :-P

ysbaddaden commented 6 years ago

I implemented a simplified version in 34b878b and 47bcd30.

I don't collect objects to finalize later, but instead run finalizers of unreachable objects between the mark and sweep phases. This has the advantage that all objects are still reachable, we don't delay the memory collection (good), and don't have to mark/map/iterate objects to collect (just iterate the HEAP once more), while still being called from a Crystal context; finalizers are run in the collector fiber, thus from a crystal thread, and any assumed context should be accessible. As long as the GC is compiled with unwind tables, even raising should be possible (untested), thought the collector fiber will assume this is a bug and abort.

This has the disadvantage that finalizers may call GC_malloc which may reenter GC_collect, but that should be mitigated (i.e. memory will grow). I believe that allocating memory in a finalizer is bad (i.e. a bug); we could maybe detect it and print a nasty warning to stderr.

ysbaddaden commented 6 years ago

I fixed some corner cases (4b823c53cbebe57fd3f7a97e23c7ebec5d20f629 and bed169ac57151dc2ed2151c0fc55ddff32e5f76c), and a bug with large object finalization (6ed836cd4ec162bc8c981dd7025aa3241beaaa9d).

Looking at the crystal compiler, for example, there aren't much finalizers registered, but with the current implementation, finalizers take a void * pointer for each allocation, pushing the metadata overhead from 16-bytes to 24-bytes (+33%) on aarch64/x86_64 (8 to 12-bytes on arm/x86), for each and every allocated object, which ain't nice (we already waste 12-bytes for small objects). Finalizers also require to scan the whole HEAP when finalizing.

Maybe registering objects with a finalizer into a hash map structure could help:

instead of iterating all the HEAP, we'd iterate the hash map, and look whether each object is reachable or not;
instead of registering finalizers into each object metadata, we'd register them in the hash map instead.

ysbaddaden commented 6 years ago

Now using a hashmap.

ysbaddaden / gc

Object Finalizers #2