tc39 / proposal-symbols-as-weakmap-keys

Permit Symbols as keys in WeakMaps, entries in WeakSets and WeakRefs, and registered in FinalizationRegistries
http://tc39.es/proposal-symbols-as-weakmap-keys
MIT License
90 stars 7 forks source link

Use of registered symbols would allow observation that a `WeakMap` entry has been freed #21

Closed brad4d closed 2 years ago

brad4d commented 2 years ago

I've just had a discussion with WH to get a clear picture of his objection to allowing symbols created with Symbol.for() as WeakMap keys.

To summarize, This is the use-case that must be disallowed.

  1. Symbol.for('name') gets used as a WeakMap key.
  2. Later, no references to Symbol.for('name') exist other than the WeakMap key and the (virtual) GlobalSymbolRegistry entry, which don't count. So, both of these are effectively freed.
  3. Still later Symbol.for('name') is called again and the result used to attempt lookup in the same WeakMap, finding nothing.

The key misunderstanding many of us have had revolves around the assumption that the GlobalSymbolRegistry entry defined in the spec must keep the Symbol.for() keys alive forever, making them effectively the same as well-known symbols that are stored on global objects. This is incorrect.

In our conversation WH explained it like this.

The VM table you're referring to is spec convenience fiction and an implementation need not implement it that way.

There are many such spec fictions. Consider a program that sits in a tight loop creating objects and not storing them anywhere. Before we had weak maps, we had no notion of object destruction, so per the spec you'd quickly exhaust memory with objects. But a realistic implementation doesn't need to do it that way and can run such a loop indefinitely. Similarly, the VM table is a spec fiction that ensures that when you call Symbol.for twice with the same string, you get the same symbol. Now imagine a program sitting in a tight loop creating consecutive unique strings "0000000000", "0000000001", "0000000002", etc., calling Symbol.for on them, but not storing the symbols anywhere. To preserve the user-visible Symbol.for semantics, an implementation need only to keep track of live symbols, so it's possible for an implementation to run such a loop indefinitely without exhausting memory. That table of all Symbol.for symbols is spec fiction.

So, I believe this proposal & its spec text need to be updated to disallow use of registered symbols as WeakMap keys.

ljharb commented 2 years ago

The proposal will not advance with that restriction; either all symbols or none can be WeakMap keys.

I also don't see the issue here. A registered symbol can, in general, be "freed" when it's unobservable. However, by putting it in a WeakMap, it can never be freed until that WeakMap is - a registered symbol that is weakly held remains "live" as long as the thing holding it is live. Why is this a problem?

ljharb commented 2 years ago

the GlobalSymbolRegistry entry defined in the spec must keep the Symbol.for() keys alive forever, making them effectively the same as well-known symbols that are stored on global objects. This is incorrect.

In other words, if you s/alive forever/observably alive forever, it seems correct to me, and allowing them to be WeakMap keys doesn't change this - it just adds a new way for them their liveness to be observable.

brad4d commented 2 years ago

I also don't see the issue here. A registered symbol can, in general, be "freed" when it's unobservable. However, by putting it in a WeakMap, it can never be freed until that WeakMap is - a registered symbol that is weakly held remains "live" as long as the thing holding it is live. Why is this a problem?

The point of a WeakMap is things used as keys in it are not supposed to count toward keeping them alive. Isn't it a contradiction, then, for a symbol to be kept alive because it is used as a WeakMap key?

mhofman commented 2 years ago

The proposal will not advance with that restriction; either all symbols or none can be WeakMap keys.

This is a pretty strong stance. I would hope there was still a way to arrive at some kind of consensus here.

The key misunderstanding many of us have had revolves around the assumption that the GlobalSymbolRegistry entry defined in the spec must keep the Symbol.for() keys alive forever

As @ljharb mentions, the key bit is not that GlobalSymbolRegistry exists in the spec, rather that the registered symbol is observed through a WeakMap. As I explained in https://github.com/tc39/proposal-symbols-as-weakmap-keys/issues/19#issuecomment-848368824, the implication is that a registered symbol cannot be collected for as long as anything observing its liveness is itself live. Some believe that this could be a common mistake by programs that would not test for the type of symbols if not prevented by the language, and would result in preventable memory leaks, similar if we allowed string or numbers to be used as weakmap keys.

I personally see no reason to allow forgeable values as weak keys/targets, besides allowing lazy programs to shoot themselves in the foot.

ljharb commented 2 years ago

@brad4d no, because a registry symbol can never observably be freed, ever, already. The spec fiction is there to ensure that. In other words, it would only be in an implementation that attempted to walk the tightrope between freeing them, and keeping that unobservable, that this new difficulty would emerge.

mhofman commented 2 years ago

Btw, there is another reason why registered symbols being disallowed as weak keys would make sense: implementation complexity.

Since GlobalSymbolRegistry is a spec fiction, an implementation can currently trivially implement registered symbols as a sort of composite value wrapping the underlying string or number description, and compare those symbols by value instead of by unique identity like non-registered symbols. Symbol.for then simply forges a new value from the description, and Symbol.keyFor simply unwraps the underlying description value.

If allowing registered symbols as WeakMap keys, such an implementation would likely be forced to change their implementation and required to generate a unique identity for registered symbols, akin to forcing ropes.

brad4d commented 2 years ago

Thanks @mhofman for that succinct example.

This is in fact the point WH was trying to make, I think. Implementations like that one are supposed to be possible for efficiency reasons.

They truly put symbols created with Symbol.for on the level with numbers, strings, booleans, etc. You can never really "clean up" these values, because the exact same value could always be constructed again later. This is why they are disallowed as WeakMap keys.

Allowing non-registered Symbols is OK because they can only ever be created once, so they can be safely cleaned up when no non-WeakMap-key references to them remain.

ljharb commented 2 years ago

I don't understand why such an implementation would need to change everything, as opposed to making a special carve-out for the subset of registered symbols that are weakly held.

ljharb commented 2 years ago

Separately, do any such implementations exist that could speak to these difficulties? If not, then if the spec was designed to allow a kind of implementation, but in practice there's no concrete evidence that there's any benefit to doing so, why preserve it?

mhofman commented 2 years ago

I don't understand why such an implementation would need to change everything, as opposed to making a special carve-out for the subset of registered symbols that are weakly held.

By special carve-out, I suppose you mean branch on the type of symbol when creating a WeakRef or WeakMap entry? That feels like significant complexity added everywhere registered symbols could be used.

Separately, do any such implementations exist that could speak to these difficulties?

I'm not sure, but @erights mentioned that's how he would have expected engines to implement registered symbols.

ljharb commented 2 years ago

@mhofman if we went with your suggestion of differentiating symbols between registered/well-known and "other", they'd already have to do that - so that complexity seems to be acceptable.

mhofman commented 2 years ago

They would only have to do that when adding the entry to the weakmap / creating the weak ref, and throw a TypeError. Here you're talking about having a parallel Map / Ref implementation for registered symbols keys/targets.

jridgewell commented 2 years ago

The proposal will not advance with that restriction; either all symbols or none can be WeakMap keys

I think disallowing registered symbols is a small wart that allows for a much more valuable use case to work, and so we shouldn't block just because of that. Once implementers get actual experience with how complex symbol GC becomes, we can try reopening to allow registered symbols later on.

ljharb commented 2 years ago

@jridgewell that's an opinion i don't share.

acutmore commented 2 years ago

That feels like significant complexity added everywhere registered symbols could be used.

Additional methods that would need* to branch for registered symbols if they were allowed:

The branch being to switch the logic to the separate internal Strong{Set,Map} used for registered symbols.

Not being a member of a JS engine implementation team, I will leave the measure of added complexity to others.

* In contrast: If registered symbols are not allowed, the above methods would not necessarily need to branch specifically for registered symbols. The existing code path for handling symbols could be used as the result would be the same. Though implementations may still add code to specially handle registered symbols as an optimisation (early return).

** EDIT: an implementation could store the registered symbols in the usual weakMap's internal structure. But also store hold registered-symbol strongly to avoid any optimisation that would otherwise collect it. This means that the has and get methods would not need to branch for registered-symbols and could look them up in the same way as other values.

https://gist.github.com/acutmore/8f16a8d4cf4b7b54c5956bd6eab05cfd

The following methods would have to check if a symbol is a registered symbols either way. - WeakSet.prototype.add - WeakMap.prototype.set - FinalizationRegistry.prototype.register - WeakRef constructor And I don't think `FinalizationRegistry.prototype.unregister` would need to be changed for either approach. If registered symbols are allowed, then it's likely `FinalizationRegistry.prototype.register` would silently ignore the registration so there would be nothing to unregister. And if registered symbols are not allowed to be registered, then again there would be nothing to unregister.
acutmore commented 2 years ago

Isn't it a contradiction, then, for a symbol to be kept alive because it is used as a WeakMap key

Perhaps the developer experience (in the case of allowing registered symbols) could be improved with clear warnings? For example on MDN and maybe even at runtime with implementations printing a console.warn. To increase the probability that a JS developer would know that putting a registered symbol in a WeakMap could actually increase memory usage due to it disabling a memory saving optimisation that some engines employ.

( I'm not advocating a position, but trying to help find common ground )

acutmore commented 2 years ago

Separately, do any such implementations exist that could speak to these difficulties @ljharb

Yes, Firefox (SpiderMonkey) collects symbols created by Symbol.for if they are no longer observable. (At least from my testing just now in Firefox nightly)

ljharb commented 2 years ago

I'd love to hear a FF contributor's take on the difficulty of knowing that registered Symbols that are weakly held remain observable while the weak container is.

mhofman commented 2 years ago

I think the question was also, do any implementation currently implement registered symbols as a wrapped description value without giving them a unique identity (aka doesn't use any kind of map / hash table, and doesn't implement the spec fiction of the global registry).

ljharb commented 2 years ago

That was the first half :-) the second half is speaking to the difficulties.

mhofman commented 2 years ago

Taking a step back however, the implementation complexity, while important, shouldn't be the driving factor.

I still remain confused by the motivation to allow forgeable values in weak collections. What kind of program would want to legitimately do this? How are registered symbols, or tuples/records not containing symbols, any different from forgeable primitives like string and number? If a program does have a use case for forgeable values, why can't they implement the logic themselves combining a Map and a WeakMap ?

ljharb commented 2 years ago

One use case is that I want a maximally weak collection. I don't particularly care if any of the values are weakly or strongly held - i just want to weakly hold them when possible. The more things I can put in the WeakMap, the simpler my wrapper class can be, and the more things i can delegate to the implementation.

You're totally right that I can implement this already by wrapping a Map and a WeakMap - however, this means borrowed prototype methods fail (or, if i'm clever, one set of borrowed methods fails and the other works, depending on the key type), and this also means that even non-registered non-well-known Symbols will be strongly held.

In other words, I value the additional optimistic weakness that "all symbols can be weakly held" provides, and since GC is already nondeterministic, (short of specific implementation knowledge) nobody can rely on "anything that can be weakly held will be collected", so there's nothing lost by this approach.

Concerns about memory leaks are inevitable because the language spec already does not force collection. The language only discusses observability - and observability is all one can rely on. Collectibility is an emergent property an implementation may choose to provide, but it is not one that language practitioners may safely ever rely on.

mhofman commented 2 years ago

One use case is that I want a maximally weak collection. I don't particularly care if any of the values are weakly or strongly held - i just want to weakly hold them when possible

Does this apply to string and numbers as well? If not, what makes symbols different?

Also, what purpose would such a collection have in a program?

and this also means that even non-registered non-well-known Symbols will be strongly held.

How so? Why can't you put them in the WeakMap section of the abstraction? I do agree a try/catch is cumbersome and why I think we should offer a built-in predicate to test "weak-hold-ability".

borrowed prototype methods fail

As a userland abstraction, why would it be necessary? Aka, why can't you use the prototype methods of your new abstraction.

Concerns about memory leaks are inevitable because the language spec already does not force collection. The language only discusses observability - and observability is all one can rely on. Collectibility is an emergent property an implementation may choose to provide, but it is not one that language practitioners may safely ever rely on.

I think we should be realistic, and build APIs for real world programs and machine. By your reasoning, WeakMap and WeakRef have no place in the language in the first place.

Sure, a program should not rely on garbage collection, but without garbage collection, programs would very likely run out of memory, because unlike what the spec pretends, machines do have a finite amount of memory.

Memory leaks are a very real issue in programs, and they're usually pretty obscure. While I wouldn't consider adding a forgeable value to a weak collection an obscure source of leak, I'm not sure the average developer would be able to realize the problem. Preventing the obvious leak and forcing them to find a workaround if that's the behavior they actually need is what I'm after here.

acutmore commented 2 years ago

One use case is that I want a maximally weak collection. I don't particularly care if any of the values are weakly or strongly held - i just want to weakly hold them when possible.

I checked a few other languages (below) to see if they offered such a collection in their standard library and they did not seem to.

ljharb commented 2 years ago

By my reasoning, WeakMap, WeakSet, and WeakRef's place in the language is to be optimistically weak - meaning, not providing guarantees, just conveying intentions to the engine.

I'm not sure the average developer would be able to realize the problem

The average developer won't be using either registered symbols or any Weak thing, both of which are incredibly obscure.

I checked a few other languages

To me, that's a potential argument against any symbols being allowed; it's not an argument against registered symbols being allowed.

nicolo-ribaudo commented 2 years ago

I think this proposal is unrelated to "optimistically weak" collections: I agree that it would be nice to have it in the language (I had to implement it by myself multiple times), but it's not just about registered symbols.

An OptimisticallyWeakMap (or just relaxing WeakMap's restrictions) should probably be presented as a separate proposal (either competing with this one, or as an extension), because it asks for much more than what this is asking (e.g. allowing numbers, strings, ...) and can have concerns that are different from the ones raised for this proposal (for example, "unexpected memory leaks" does not hold if it's a class with a new explicit name).

mhofman commented 2 years ago

By my reasoning, WeakMap, WeakSet, and WeakRef's place in the language is to be optimistically weak - meaning, not providing guarantees, just conveying intentions to the engine.

I think what you're looking for is a new type of collection, which would accept any primitive, including strings and numbers. I don't see why registered symbols should be held to a different standard.

To me, that's a potential argument against any symbols being allowed; it's not an argument against registered symbols being allowed.

But do other languages have a notion of symbols though? Aka non-object values which may be unique and unforgeable? (genuine question)

ljharb commented 2 years ago

I think we're overindexing on my particular use case.

Regardless, I think creating a bifurcation in the category of "Symbols" is an unacceptably confusing thing to add to the language, and I remain unswayed by arguments pertaining to collectibility, which is something the language does not guarantee.

mhofman commented 2 years ago

The bifurcation needs to happen somewhere. A program which wants to be responsible and not leak memory needs to be able to know what it can put in a WeakMap without causing a leak.

Requiring the program to use heuristics like calling Symbol.keyFor is not future proof. It also would be extremely onerous when combined with records and tuples: imagine if a program had to walk every value held in those, and test if it's a symbol, but not a registered symbol.

I maintain that we need a predicate offered by the language to perform this test. That predicate needs to implement the same check that I'm advocating should prevent the addition to weak collection.

acutmore commented 2 years ago

The use cases for this proposal are for patterns where unique symbols are created and then stored in a WeakMap. Not for taking symbols from an unknown source.

In what use cases do we think this concern about registered symbols would come up in practice?

As someone who has been in the stressful position of debugging a slow memory leak that is happening on clients where it’s not possible to get heap snapshots - I am sympathetic to an API that could catch a leak sooner. That said I’ve not come across code that creates an ever growing set of registered symbols. Code I’ve seen uses Symbol.for to create a fixed number of symbols with static strings, usually to ensure compatibility across separate packages.

michaelficarra commented 2 years ago

anything should be allowed as a WeakMap key

elmo

fstirlitz commented 2 years ago

The use cases for this proposal are for patterns where unique symbols are created and then stored in a WeakMap. Not for taking symbols from an unknown source.

Why not then design a less orthogonal API, tailored specifically to that use case?

Introduce WeakMap.prototype.store that mints a new symbol, uses it as a key to store the given object, and returns that symbol. Using a symbol as a key in WeakMap.prototype.set would stay an error, unless the symbol is already present in the map. Analogously, introduce WeakSet.prototype.addSymbol as well.

acutmore commented 2 years ago

Hi @fstirlitz,

this has come up in some other threads (I’ll try and find links when not on my phone).

The issue that was raised with an API like that is that it would either be cross-realm and break boundaries like ShadowRealm or it would be a per-realm store, and having per-realm state was seen as an issue as there is not currently global functions that have per-realm behavior.

EDIT: realised this is not the same as the other suggestions which had the functions on a global factory not on the instance.

caridy commented 2 years ago

@acutmore @rricard when covering the update form yesterday during the SES meeting today, something new popped with respect to this change. Basically, it is going to be very hard now to polyfill well-known symbols, considering that they are, in principle, very similar to registered symbols. I don't know if that was discussed before, but worth discussing it.

ljharb commented 2 years ago

I'm not sure why it would; none of the polyfills I know of use the symbol registry, since having Symbol.keyFor(Symbol.whatever) return "not undefined" would be a much bigger problem than having a polyfilled well-known-symbol be single-realm.

mhofman commented 2 years ago

I filed https://github.com/tc39/proposal-symbols-as-weakmap-keys/issues/27 to discuss this