Weakref/finalizer support

MostAwesomeDude commented 8 years ago

We have many options for weakref primitives. However, I would like to only write code for one such primitive, so let's figure out which one is best.

Safety and Availability

This functionality cannot be in the safe scope. No way. This is dangerous. The timing and order of operations inherent to finalization permits a one-way covert side channel of information into any code which receives weakrefs; the code producing the weakref'd objects can covertly leak information by intentionally triggering GCs in a certain order or with a certain timing pattern.

Requirements

Each of these use cases needs to be constructible.

Finalizers

We can add a finalizer to an object such that the finalizer will receive .run/0 on some turn subsequent to the object being GC'd, here called victim:

def reactor():
    traceln(`Object has died`)
object victim {}

This can either be by losing the original reference and redirecting all accesses through some sort of proxy:

def proxy := finalizerHelper.register(victim, reactor)

Or by installing a finalizer directly onto the object:

finalizerHelper.install(victim, reactor)

Primitives

Our options, in no particular order:

Ephemerons
Weak references
Weak proxies
Non-enumerable (mutable) weak key maps
Finalizer reactors

dckc commented 8 years ago

What options do you see? What are the relevant precedents.

Some searching turns up...

http://erights.org/javadoc/org/erights/e/elib/tables/WeakKeyMap.html

zarutian commented 8 years ago

Hmm... uses of WeakKeyMaps as far as I can tell:

Any kind of 'seen' tables used by -- Membrane implementations (to prevent multiple wrapping of same object by the same membrane) -- Serialization surgeons to keep track of already serialized objects and their mapping to names internal to the graph depiction
Far ref comms system to keep track of local proxies of far refs spanning vats and to send the remote vat signal that this vat is no longer intrested in the object refered by those refs.

And there is also: http://erights.org/javadoc/org/erights/e/elib/vat/WeakPtr.html whose reactor is effectively an finalizer. But it has side channel issues for confined code for signals going into the confinement.

One can probably be implemented via the other but in the case of WeakPtr being implemented via WeakKeyMap it involves an interval timer compering an WeakKeyMap and a list of values to see which keys were gc'ed.

dckc commented 8 years ago

Dangerous? @erights is this true?

I see makeWeakRef discussed as a source of non-determinism in https://github.com/FUDCo/proposal-frozen-realms, but in Nov 2011, about the time I wrote http://www.madmode.com/2011/11/capability-security-in-e-coffescript.html , I recall Mark using WeakMap for sealing / unsealing. I think it's in some of his talks. Yes... on the "Money as “factorial” of secure coding" slide of http://soft.vub.ac.be/events/mobicrant_talks/talk1_ocaps_js.pdf

erights commented 8 years ago

Weak references are a source of non-determinism that is actually worse than most regarding side and covert channels. Thus, they need to be closely held. See https://github.com/tc39/proposal-weakrefs/blob/master/specs/weakrefs.md#portability-and-observable-garbage-collection by @dtribble for a good discussion in a JavaScript context, which raises some issue not raised by E.

JavaScript WeakMap on the other hand present no observable non-determinism and no new side or covert channels. When I initially proposed them I called them EphemeronTables. This name was awful so I was to quick to agree to rename them WeakMaps. The reuse of the term "Weak" has misled many people into thinking that WeakMaps and WeakRefs are closer than they are.

The E WeakKeyMap happened before I understood why we need true EphemeronTables/WeakMaps. WeakMaps are needed for good membrane support, and they are a better rights amplification primitive than sealer/unsealer pairs, as shown on that Money-as-factorial slide.

More later...

dckc commented 8 years ago

So @MostAwesomeDude, is there any reason to go beyond WeakMaps? What use cases motivate going further?

For that matter, what about doing nothing at all? For requirements I see "Each of these use cases needs to be constructible." but I don't see any use cases. "Finalizers" is a mechanism, not a use case. A use case starts with something like "Fred is trying to build a secure distributed conference management system..."

erights commented 8 years ago

WeakRefs are necessary to do distributed objects with reasonable garbage collection. WeakMaps are necessary for good gc across local membranes. Neither mechanism subsumes the other.

Sorry to speak in brief riddles. No time right now for more. Hopefully wednesday....

MostAwesomeDude commented 8 years ago

That is a riddle. Well, we can implement two kinds of objects for this.

interface WeakMap:
    to put(k, v) :Void
    to get(k)
    to fetch(k, ej)
    to contains(k) :Bool
    to removeKey(k)
    to clear()

def _makeWeakMap() :WeakMap:
    ...

I'm not super-attached to these verbs, but they are the same verbs that FlexMaps have, except for .clear/0 which is curiously missing.

Any resolved object is allowed as a key; anything is allowed as a value.

Implementation details are not especially interesting. We can directly wrap RPython's rpython.rlib.rweakref.RWeakKeyDictionary and not expose any of the unsafe functionality. Note that we do not use a reference-counting GC, so it really can be surprisingly indeterminate as to when the values will actually get reaped.

interface WeakRef:
    to run(ej)

def _makeWeakRef(obj) :WeakRef:
    ...

Not sure whether this is a reasonable interface. To avoid a null problem, an ejector is fired instead if the reference has died.

This is the tricky thing to implement. We need to come up with some clever way of interconnecting vats and weakrefs so that weakrefs won't be reaped mid-turn.

Here's the infamous mint with this WeakMap proposal:

def makeMint():
    def amp := _makeWeakMap()
    return def mint(var balance :Int):
        object purse:
            to getBalance() :Int:
                return balance
            to makePurse():
                return mint(0)
            to deposit(amount :Int, src):
                amp[src](amount)
                balance += amount
        def decr(amount :Int):
            balance -= amount
        amp[purse] := decr
        return purse

erights commented 8 years ago

Sorry for continuing to speak in riddles for now. But doing so leaves a record of what I need to clarify when I have time, so...

It would be ok for FlexMap to have a .clear() method.

It would be bad for WeakMap to have a .clear() method. This breaks an independence property useful for security as well as inhibiting a useful implementation technique: the transposed representation. (After much lobbying, I killer the .clear() method proposed for JavaScript WeakMaps on these grounds.)

FlexMaps should indeed take any equality-comparable values as keys, which in E are all resolved first class values. By contrast, WeakMaps should only take (in E terminology) resolved selfish objects as keys, since only these have fresh unforgeable identity.

From the name, I would suspect that RPython's WeakKeyDictionary is based only on weak collection, not full ephemeron collection, and so will cause leaks in membrane scenarios that should work. But I don't actually know the semantics of this RPython thing.

See @dtribble 's https://github.com/tc39/proposal-weakrefs/blob/master/specs/weakrefs.md#portability-and-observable-garbage-collection for a cheap and easy technique to avoid visible collection of weak refs during a turn. The cost is almost nothing.

erights commented 8 years ago

E did indeed use only WeakKeyTables (under some name) based on weak references, which were weaker than ephemerons. As a result, E membranes had unpluggable leaks I did not notice until my later JS work.

dtribble commented 8 years ago

Also, the Javascript WeakRef proposal does describe some of the use cases that cannot be achieved using weak maps of various sorts. Many of them include invoking cleanup behavior after objects have been collected.

dckc commented 8 years ago

I appreciate the thoughts as they occur to you, @erights and @dtribble; none of the above is too much of a riddle to make sense. Perfection is the enemy of the good and all that...

MostAwesomeDude commented 8 years ago

Yeah, getting all the GC semantics lined up and fast is going to be a serious undertaking that might not make sense until we do a self-hosting native Monte compiler.

We discussed weakrefs as slots today. The idea sounds good in theory but I have no idea how painful it might be in practice. Also @zarutian pointed out that this would rapidly lead to a desire for mutable weak slots, which are trickier in implementation.

dtribble commented 8 years ago

FWIW, for when you start looking at WeakRef mechanisms, the semantics and implementation approach for weakrefs in the JavaScript proposal were strongly influenced by design work for WeakRefs in Joule, E, and Midori (a high-performance OS in which processes were communicating event loops).

MostAwesomeDude commented 3 years ago

Coming back to leave a breadcrumb: A reasonable Horton implementation requires this. We should probably investigate adding ephemerons to RPython, since it's been years and we haven't moved much.

monte-language / typhon