tc39 / ecma262

Status, process, and documents for ECMA-262
https://tc39.es/ecma262/
Other
15.03k stars 1.28k forks source link

Symbols can't be used as WeakMap keys #1194

Closed devsnek closed 1 year ago

devsnek commented 6 years ago

Symbols cannot be used as WeakMap keys as they are primitives. However unlike other primitives they are unforgable and unique, so they should be completely safe for use as WeakMap keys.

ljharb commented 6 years ago

Global symbols (Symbol.for, Symbol.keyFor) are not unique, and as primitives, can't ever be collected - I assume that since these kinds of symbols can't be WeakMap keys, and because it would be confusing for some symbols to be usable as WeakMap keys but not others, that it makes the most sense to disallow all symbols as WeakMap keys?

erights commented 6 years ago

Hi Jordan, yes, I went down the same chain of reasoning.

On Tue, May 15, 2018 at 11:59 AM, Jordan Harband notifications@github.com wrote:

Global symbols (Symbol.for, Symbol.keyFor) are not unique, and as primitives, can't ever be collected - I assume that since these kinds of symbols can't be WeakMap keys, and because it would be confusing for some symbols to be usable as WeakMap keys but not others, that it makes the most sense to disallow all symbols as WeakMap keys?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tc39/ecma262/issues/1194#issuecomment-389276760, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQtzAoxX7S4Mmw_fFAStvveTlzCvXcLks5tyyWJgaJpZM4UADYc .

-- Cheers, --MarkM

devsnek commented 6 years ago

i would lean towards allowing global symbols since they aren't the only things that could prevent collection. any value used as a weakmap key can be attached places such that it won't be collected. beyond that, Symbol() symbols should definitely be allowed.

tabatkins commented 6 years ago

I don't see a significant difference between a registry-Symbol and an object that's stored on the global. Both will never be collected, and thus will prevent a weak value from being collected.

But if we really do think this is a footgun, then we can just disallow registry-Symbols from being used (those for whom Symbol.keyFor returns a non-undefined value). The registry can't be manipulated by the user; Symbols are inserted into it by the UA immediately upon creation and never removed, so "in the registry" is effectively a constant quality of the Symbol itself, and can be relied on.

It just seems silly that {} can be a key but Symbol() can't, given that they can serve similar purposes.

ljharb commented 6 years ago

You can always prevent collection; the difference is that with objects you can always allow collection by dropping all refs to the object - or by having the realm itself collected.

With global symbols - which are cross-realm - it would prevent ever collecting it.

bmeck commented 6 years ago

@ljharb If a WeakMap is reaped it does not prevent collecting values with eternal keys since that weak reference to the value is removed, even if the key itself is strongly held still.

bmeck commented 6 years ago

@erights On a tagent that might come back to this, is there any reason having keys mismatch between Weak collections and WeakRef might be problematic? I know for WeakRef it would never fire a finalizer for a Symbol from the SymbolRegistry if it were allowed as a key (but it would strongly keep the finalizer/holdings alive I think?).

gibson042 commented 6 years ago

With global symbols - which are cross-realm - it would prevent ever collecting it.

Isn't the same true of the outermost global object?

ljharb commented 6 years ago

@gibson042 if the realm is collected, the global object for it could also be (assuming it was a key in a WeakMap from a different realm)

gibson042 commented 6 years ago

Right, but I was referring to the global object of the outermost realm (though I suppose such a statement reads as vacuously true). Still, the set of well-known Symbols shared across realms is necessarily bounded and small, and their uncollectability shouldn't be a concern. Or am I missing something?

bmeck commented 6 years ago

I'm not sure this talk about keys really matters if the WeakMap being reaped allows what it is holding be reaped? Even if I create a map using:

let map = new WeakMap();
map.set(Symbol.iterator, BIG_OBJECT); // Symbol.iterator is shared between all realms as well
map = null;

BIG_OBJECT can still be collected since the map can be collected.

loganfsmyth commented 6 years ago

One thing that comes to mind here is that doesn't seem to be polyfillable without leaking. Are there any concerns around that?

bmeck commented 6 years ago

@loganfsmyth isn't that a concern with any Weak collection polyfill?

devsnek commented 6 years ago

@bmeck traditionally the polyfills assign a "hidden" property on the key that holds the value. that can't be done with a Symbol.

benjamn commented 6 years ago

In a recent project, I resorted to implementing a class called UniversalWeakMap, which maintains a WeakMap and a Map instance property, called this._weakMap and this._strongMap, and simply stores any non-reference keys (which can't be stored in this._weakMap) in the this._strongMap instead, so that it can expose a uniform get/set interface that works for any kind of key. Of course the non-reference keys are strongly held (including Symbols, unfortunately), but the whole UniversalWeakMap object could potentially be garbage collected.

Note: for this particular project, I didn't need to match the shared WeakMap/Map interface exactly, which is why my UniversalWeakMap doesn't have methods like has and delete, though they would be easy to implement.

I mention this example as evidence that it would be genuinely useful to have fewer restrictions on what you can put in a WeakMap, as long as you don't mind the drawbacks (in particular, you don't care about iterability).

To the specific question of whether Symbol keys should be allowed in WeakMaps, that would certainly be more convenient than the status quo. However, I would go a step further, and recommend that any kind of key should be allowed in a WeakMap. None of the keys (weak or strong) should be iterable, obviously. Those keys that can be garbage collected should disappear from the WeakMap whenever they become unreachable, and those keys that can't be collected (either because they're a kind of value that can never be collected, or they just never happen to become unreachable) should simply remain in the WeakMap until they are explicitly removed by the program.

In either case, a native WeakMap implementation will not keep any keys from being collected that could otherwise be collected, so I don't see any potential for confusion or memory leaks—in a native implementation, at least, which is why this would need to be standardized rather than just polyfilled.

devsnek commented 6 years ago

forgable values can't be done because we want to prevent observing gc

benjamn commented 6 years ago

@devsnek Forgeable Symbol values by definition can't be collected, since they have to be === if you forge them again later, so there's nothing to observe?

ljharb commented 6 years ago

Observing GC isn’t a concern; WeakRefs are coming, either via JS or via WASM, so it will be observable.

benjamn commented 6 years ago

Especially once we have WeakRefs and thus observable GC (but even right now), I honestly don't see the benefit (for developer expectations or memory usage or any other reason) for refusing to store non-reference keys in a WeakMap.

erights commented 6 years ago

Observable gc absolutely is a concern. That's why we separate weakref from weakmap and put weakref on the System object.

However, I agree it is not relevant to this thread. Non ref keys would never be collected so there's nothing too observe.

On Thu, May 24, 2018, 6:56 PM Jordan Harband notifications@github.com wrote:

Observing GC isn’t a concern; WeakRefs are coming, either via JS or via WASM, so it will be observable.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tc39/ecma262/issues/1194#issuecomment-391890792, or mute the thread https://github.com/notifications/unsubscribe-auth/AAQtzGEaQDgObbLfCJFlAQLTAkw0PtKVks5t1zqLgaJpZM4UADYc .

ljharb commented 6 years ago

@erights thanks for clarifying.

In that case, can anyone elaborate on why WeakMaps can't accept all values as keys? (@erights, @allenwb?)

bakkot commented 6 years ago

Personally, I expect that if I put a thing in a WeakMap then the corresponding value can someday be GC'd. For example, I have tons of code which does stuff like

const cache = new WeakMap;
function process(data) {
  if (cache.has(data)) {
    return cache.get(data);
  }
  const res = expensiveAlgorithm(data);
  cache.set(data, res);
  return res;
}

with the expectation that this caching is acceptable because the caller can always choose to drop data if they want res to stop taking up memory.

I really don't think it's a good idea to mix strong and weak holding in the same data structure. Yes, I know that sometimes the global object ends up being effectively strongly held, but that's an extremely minor and extremely edge-y edge-case.

woess commented 6 years ago

Sorry to sidetrack, but what do you mean by (un)forgeable values?

devsnek commented 6 years ago

@woess forgability refers to the ability to create a value that has the identity of another value without having access to the original value. two examples of this are numbers and strings. i can create 1 and have it be identical to this other 1 over here, without having access to the original 1. as with strings i can create "hello" in one place and "hello" in another place and they are identical.

on the other hand we have things like objects and symbols which are unforgable. if i have Symbol() in some place and Symbol() in another place they will not be identical. The only way to have identity is to grab that original symbol. The same goes with objects ({} === {} is false)

erights commented 6 years ago

@devsnek Good explanation of unforgeable. However, only unnamed symbols --- those created by the Symbol() expression you showed --- are unforgeable. Named symbols --- those created by Symbol.for(str) --- are not unforgeable, creating the dilemma at issue in this thread.

bmeck commented 6 years ago

@erights I don't think that is true, the GlobalSymbolRegistry is returning the exact same Symbol rather than creating new primitive values in https://tc39.github.io/ecma262/#sec-symbol.for . Those things are reachable but not forgeable. I cannot recreate them without access to that registry.

erights commented 6 years ago

Access to the registry is not deniable. Given that everyone has implicit access to the registry, they can obtain access to any named symbol given only knowledge of the string. IOW, access to a named symbol is "knowledge limited" rather than "access limited".

We were very careful to design the semantics of the registry so that it would not be a global communications channel. Given the way the semantics of the registry are stated, this safety property is hard to see. A better way to describe its semantics is that a named symbol is a value, without identity, that wraps a string. All the equality comparison operators, given two named symbols, judges them to be equal iff their wrapped strings are equal. This account is not observably different and need not hypothesize any registry or any other form of shared state. In this account, it is obvious there is no global communications channel.

bmeck commented 6 years ago

@erights https://tc39.github.io/ecma262/#sec-samevaluenonnumber appears to compare that they are the same value, not based upon any internal string that I can tell. It could be implemented as potentially multiple values being checked by some internal string that appear to act as a single value, but is not what the spec appears to be saying. Do you think this a bug in the specification?

ljharb commented 6 years ago

@erights it seems like you could deny access to the registry by replacing it with one that wasn’t truly using the global registry, thus ensuring that global symbols created in the restricted realm weren’t available to other realms?

gibson042 commented 6 years ago

Please don't special-case registered symbols. It is already possible to add keys to a WeakMap that will not be collected from it, whether because such collection is strictly impossible (e.g., the outermost global object or the WeakMap itself) or because the keys happen to remain reachable (e.g., built-in or user-defined globals). Considering consistency, complexity, and cognitive burden, the criteria pretty much has to be based on type and supports only two possibilities—1) current behavior of requiring keys to be non-primitive, or 2) the proposed behavior of requiring keys to be of a type for which SameValueNonNumber compares by reference (matching user experience of strict equality comparison). It's not worth adding symbols if they come with a global registry exception.

And if symbols are permitted as WeakMap keys, they should be allowed in WeakSets as well.

woess commented 6 years ago

I don't think it's a good idea to allow any values shared across realms be used as WeakMap keys.

I'd also like to know if there's any use case that would justify allowing (unique) symbols. So far this part of the question has not been addressed at all.

axefrog commented 6 years ago

Just to throw my hat in the ring, I'm currently building a system that uses unique symbols as property keys in order to avoid naming conflicts, as I have a lot of code generated internally at runtime in response to arbitrary changes to ontological context. When calculating hashes of different data structures, symbols are a challenge to hash because though it is easy to associate them with a numeric value, if a symbol's associated ontology is no longer in use, there's no way for me ensure that a cached hash for that symbol is also cleaned up. You can box a symbol within an object of course, but that won't work as a WeakMap key because I have to keep a reference to the object somewhere in order to use it for lookup purposes.

In a nutshell, allowing symbols to be used as WeakMap keys provides a way to associate them with metadata that can be dropped when the symbol reference is orphaned.

With respect to the argument "...but global symbols will never be cleaned up" - this is true anyway for objects isn't it? For example:

weakmap.set(Date, 'This will never be cleaned up');
weakmap.set(document, '...nor will this');
weakmap.set(WeakMap, '...or this');
ljharb commented 6 years ago

Date = null; WeakMap = null; weakmap.constructor = null; might be enough to clean up two of those, but it’s certainly true that there’s objects you wouldn’t be able to get rid of. weakmap.set(weakmap, true) presumably never would either.

erights commented 6 years ago

weakmap.set(weakmap, true) presumably never would either.

Why not?

ljharb commented 6 years ago

@erights ah, i guess it would, scratch that.

gibson042 commented 6 years ago

How would a weakmap.set(weakmap, true) item ever be collected from the map, as opposed to with the map? Likewise for use of %ObjectPrototype% or %FunctionPrototype% as keys, or the outermost global object.

devsnek commented 6 years ago

I believe the concern here is that realms can be collected but the well known symbols never will be. I don't understand the concern but also not very familiar with implementations.

erights commented 6 years ago

I believe the concern here is that realms can be collected but the well known symbols never will be.

yes.

I don't understand the concern but also not very familiar with implementations.

It's not an implementation issue but a semantic one.

A key can be unobservably collected from a WeakMap only if the key can never again be looked up, because that key can never again exist. A well known or registered symbol might always exist again, and therefore might be looked up at some point in the future.

It is true that a fresh unregistered symbol can be known to never exist again, which is why we could consider allowing only these as keys in a WeakMap. However, since these are otherwise so similar to well known or registered symbols, we decided against it.

Thus, we have the simple rule that

Object(k) === k iff k can be a key in a WeakMap

erights commented 6 years ago

How would a weakmap.set(weakmap, true) item ever be collected from the map, as opposed to with the map?

Correct. So long as weakmap itself is reachable, it will not be collected as a key from itself. If weakmap is unreachable, then it can be collected as a key from itself, but this is not in any sense an observable statement.

Likewise for use of %ObjectPrototype% or %FunctionPrototype% as keys, or the outermost global object.

None of these are necessarily reachable from weakmap.

allenwb commented 6 years ago

There is also an implementation considerations: Disallowing symbols as WeakMap keys increases the design alternatives space for JS engines and garbage collectors.

The existence of weak maps has a direct impact pact upon the a garbage collection and if not very carefully designed can have a significantly negative performance impact. Allowing symbols as WeakMap keys could, at the very least, complicate such designs and have an impact on other seemingly unrelated parts of an an engine design.

Some possible impacts: Some implementations might decide it makes sense to encode symbols as immediate values within tagged object references or use some common base representation for strings and symbols that is different from that used for objects. These approaches might simplify and hence speedup property lookup. But, there needs to be affordances in the GC that helps it detect when values used as weak keys no longer have any references. A good GC design would probably only want those affordances to apply to values that are actually used as weak keys as they typically have space and/or time overhead that you would like to avoid for the vast majority of values that are never used as weak keys. If the affordances require some per value state this might precluded using immediate values for symbol values (or flipping things, use of immediate symbol values might preclude the use of certain GC affordances).

To be more concrete. Allowing symbols keys in WeakMap might precluded (or significantly complicate) a design from both having immediate symbol values and using an inverted representation of weak maps.

erights commented 6 years ago

There is also an implementation considerations: Disallowing symbols as WeakMap keys increases the design alternatives space for JS engines and garbage collectors.

Thanks @allenwb I stand corrected.

But implementations should keep in mind the expense of this technique. There are an infinite number of potential fresh unregistered symbol identities. With only an immediate representation, each one would need to be a unique bit pattern, say with a large counter. When this counter reaches its limit, the agent would need to be preemptively terminated, just as it must for OOM, even if there's plenty of free memory at that time. By avoiding heap allocation, in the limit, such termination becomes inevitable.

OTOH, one can imagine a separate immediate-symbol-representation-compactor that runs in this emergency. But this makes the scheme overall more complicated, not less.

axefrog commented 6 years ago

I would think that a feature with potential value should generally never be blocked by implementation challenges except where the feature is fundamentally at odds with the goals described by the specification.

As an alternative approach, would there be any value in introducing some sort of immutable object reference/handle as a property of the symbol primitive? Something like this:

const wm = new WeakMap();
const a = Symbol('A');
wm.set(a.ref, { /* metadata */ });

typeof a.ref === 'object'; // true
delete a.ref; // no-op or error
a.ref = {}; // no-op or error
a.ref.foo = 'bar'; // no-op or error? ... or permitted?

// Use case:
for (let p in Object.getOwnPropertySymbols(target)) {
  const metadata = wm.get(p.ref);
  if (metadata !== void 0) {
    // ... reflect
  }
}

What other approaches could allow metadata to be discoverable from a symbol reference, and to be garbage collected when the symbol reference is orphaned?

allenwb commented 6 years ago

@erights well, immediately encoded symbols was just one example of how disallowing Symbol weak map keys increases the space implementation design alternatives.

I'm not sure whether limiting the Symbol space via immediate encoding would be problem. I haven't worked through the numbers. NaN encoding would give you 52 bits of range. 64-bit tagged object references should give you 60 or more bits of range. The ES specific doesn't guarantee that programs won't run time resource exhaustion situations.

allenwb commented 6 years ago

I would think that a feature with potential value should generally never be blocked by implementation challenges except where the feature is fundamentally at odds with the goals described by the specification.

I would say the exact opposite. Every time a new features is proposed its impact on current and future implementation needs to be considered and the speculative value of the feature needs to be traded-off against the speculative impact upon implementations. This isn't just about ease of implementation, I'm a strong believer that engine implementors should be earning their salaries. But, sometimes seemingly trivial features have significant non-local impacts on implementation level design that lead to significant performance impacts today or in the future.

axefrog commented 6 years ago

@allenwb Fair points to be sure, and I absolutely agree - to rephrase though, what I meant was that I believe it is important to exhaust potential approaches to a challenge as much as possible. Often there are good solutions that are not immediately apparent, but often not arrived at when it matters, due to ceasing exploration of the problem prematurely.

axefrog commented 6 years ago

Note: Just looking at this proposal, which perhaps mitigates the metadata issue...? https://github.com/michaelficarra/proposal-first-class-protocols

tabatkins commented 6 years ago

But implementations should keep in mind the expense of this technique. There are an infinite number of potential fresh unregistered symbol identities. With only an immediate representation, each one would need to be a unique bit pattern, say with a large counter. When this counter reaches its limit, the agent would need to be preemptively terminated, just as it must for OOM, even if there's plenty of free memory at that time. By avoiding heap allocation, in the limit, such termination becomes inevitable.

I don't understand how this is a concern for Symbols, but not for other random objects. As far as WeakMaps are concerned, there's no difference between let key = {}; wm.set(key, val); and let key = new Symbol(); wm.set(key, val);. Is this what you mean by "avoiding heap allocation"?

erights commented 6 years ago

@tabatkins I am responding to @allenwb 's suggested possibility of an immediate-only (heapless) representation of symbols. All things which currently are potential weakmap keys (objects and functions) are heap allocated, so within the domain of the garbage collector to recycle their representation, while faithfully maintaining the illusion that there are an infinite number of potential fresh object identities. As long as Symbols are heap allocated, this isn't a problem for them either.

Although it is a problem in theory for an immediate-only representation of infinitely fresh things, @allenwb is right that an internal representation using increasing values from a 128 bit counter is effectively infinite for all engineering purposes.

trusktr commented 5 years ago

I recently had the desire to use Symbols are WeakMap keys. I wanted to create a branding mechanism for my custom-made classes, so that in the case of code like

import Mixin from 'lowclass/Mixin'
import Class from 'lowclass'

const FooBrand = Symbol('FooBrand') // <--- THIS

export default Mixin(Base => {
  return Class('Foo').extends(Base, ({Protected, Private}) => ({
    // ... public props and methods ...

    protected: {
      // ... protected props and methods ...
    },

    private: {
      // ... private props and methods ...
    },
  }, FooBrand) // <--- PASSED IN HERE
})

where Protected and Private are helpers for accessing protected or private members in the sense that Private(this).foo is similar to wm.get(this).foo, that I could allow for Private and Protected access to work across instances made from the same class where the class is generated multiple times from differing invocations of the mixin.

Basically, passing the FooBrand should turn on "position privates" instead of "lexically scoped privates" (those concepts described in https://github.com/tc39/proposal-class-fields/issues/60).

Inside my Class() implementation, I wanted to use the passed-in symbols as WeakMap keys, but discovered that I can't, so I've switched to doing the following, which isn't as clean though it'll work just fine:

const FooBrand = { brand: 'Foo' } // <--- THIS

// ...

  }, FooBrand) // <--- PASSED IN HERE

It makes semantic sense to be able to use Symbol('FooBrand') instead of {brand: 'Foo'} in my example. When we see {brand: 'Foo'} in the code, it raises questions like "What does the brand option do? What values can it have?" etc. Although the Symbol version also raises its own questions, I believe those questions are a subset of the ones raised by the object version, and Symbol is more clear that something more "meta" is being achieved with the class definition, without unused data hanging around.

I'm aware that if I do something like weakmap.set(Math, LARGE_OBJECT) that LARGE_OBJECT probably won't be collected, because even if possible, let's face the fact that no one is likely to delete global references to Math or other similar global objects.

I'm okay with (especially unregistered) Symbols as WeakMap keys in order to use my own symbols as keys.

I prefer to let all symbols be treated the same (all usable as WeakMap keys). A developer needs to know that passing a Symbol.for() as key to a WeakMap means the value won't be collected; this is obvious. In general, someone who makes use of utilities like well-known symbols or registered symbols is very likely to be the type of person that will know that using those as WeakMap keys isn't a good idea.

guybedford commented 5 years ago

Do implementations not GC symbols themselves when they are no longer referenecable? If Symbols participate equally in the GC I see no reason why they shouldn't participate equally in the WeakMap. But if they do not, then that would seem an adequate argument to me.