tc39 / proposal-record-tuple

ECMAScript proposal for the Record and Tuple value types. | Stage 2: it will change!
https://tc39.es/proposal-record-tuple/
2.48k stars 62 forks source link

Boxes: usable as weakmap keys #233

Closed rricard closed 2 years ago

rricard commented 3 years ago

Follow on to #197

We would like to make this change so this ticket will track completion

rricard commented 3 years ago

@littledan - can we verify if we still want to do this?

Jack-Works commented 3 years ago

Does it mean:

const x = new WeakMap()
x.set(Box(Object), 1)
x.get(Object) === 1
nicolo-ribaudo commented 3 years ago

No, Box(Object) and Object would have different identities.

const x = new WeakMap()
x.set(Box(Object), 1)
x.set(Object, 2);
x.get(Object) === 2
z.get(Box(Object)) === 1
ljharb commented 3 years ago

WeakMap would throw presumably, on any of the unboxed objects being passed in?

Jack-Works commented 3 years ago

Why Box has it's own identity? Does it mean Box(Object) !== Box(Object)?

acutmore commented 3 years ago

Why Box has it's own identity? Does it mean Box(Object) !== Box(Object)?

Box(x) === Box(x) (when x === x)

EDIT: The current Box equality spec uses SameValueZero so Box(NaN) === Box(NaN); // true

ljharb commented 3 years ago

@acutmore just curious; Box(NaN) === Box(NaN)?

bakkot commented 3 years ago

If you have

let x = {};
Box(x) === Box(x); // true

then you can't make Box usable as a WeakMap key. (Because then putting one in a WeakMap would prevent it from ever being GC'd.)

mhofman commented 3 years ago

@bakkot could you clarify why not? If the program can no longer observe x, it can no longer make a box for x. Which means there is nothing preventing the implementation from collecting x and Box(x) in weak collections.

acutmore commented 3 years ago

@acutmore just curious; Box(NaN) === Box(NaN)?

Box(x) === Box(x) // (when x === x)

I edited just after I posted to account for that but wasn't quick enough!

EDIT: The current Box equality spec uses SameValueZero so Box(NaN) === Box(NaN); // true

bakkot commented 3 years ago

If the program can no longer observe x, it can no longer make a box for x.

But if it can observe x then it can't ever clean up a Box(x) used in a WeakMap. (Or Box(Box(x)), etc.)

ljharb commented 3 years ago

(can you make a Box of a primitive, such that Box(Box(x)) could ever happen?)

mhofman commented 3 years ago

But if it can observe x then it can't ever clean up a Box(x) used in a WeakMap. (Or Box(Box(x)), etc.)

I suppose Box(x) should only be allowed in a WeakMap if x would be allowed.

bakkot commented 3 years ago

I suppose Box(x) should only be allowed in a WeakMap if x would be allowed.

Even if x is just a regular object, holding a reference to x should not prevent values derived from x from being collected. By analogy, holding a reference to x does not prevent an engine from collecting [x].

mhofman commented 3 years ago

(can you make a Box of a primitive, such that Box(Box(x)) could ever happen?)

I'd argue that making a Box of a Box should not be allowed but simply return the existing Box.

Aka Box(Box(x)) === Box(x), the same way Object(x) === x when x is an Object.

That's probably a separate issue.

Jack-Works commented 3 years ago

Why Box has it's own identity? Does it mean Box(Object) !== Box(Object)?

Box(x) === Box(x) (when x === x)

So why Box has it's own identity (I think identity means Symbol() !== Symbol() and {} !== {})

mhofman commented 3 years ago

Even if x is just a regular object, holding a reference to x should not prevent values derived from x from being collected. By analogy, holding a reference to x does not prevent an engine from collecting [x].

The difference is that before Box, derived values had their own identity independent of the value they contain. E.g. [x] !== [x]. For Box where you can recreate a derived value which has the same identity, collecting the weak collection entry keyed on the box would be observable by the program.

But if it can observe x then it can't ever clean up a Box(x) used in a WeakMap.

To be extra clear, an implementation wouldn't be allowed to collect an entry with Box(x) as key while x is observable, but once x is no longer observable, the collection entry can be collected.

That behavior can actually be implemented in userland by having a custom weak collection which checks if the type of the key is a box, and instead operates on a secondary internal collection with the unboxed value used as key.

bakkot commented 3 years ago

For Box where you can recreate a derived value which has the same identity, collecting the weak collection entry keyed on the box would be observable by the program.

Right, exactly. Which is precisely why we shouldn't allow Boxes to be used as WeakMap keys.

mhofman commented 3 years ago

So why Box has it's own identity (I think identity means Symbol() !== Symbol() and {} !== {})

Box doesn't have its own identity, it derives its identity from its internal boxed value, like other primitives (except symbols). The difference with other primitives is that the boxed value may itself have an unforgeable identity (unregistered symbol or object).

nicolo-ribaudo commented 3 years ago

That behavior can actually be implemented in userland by having a custom weak collection which checks if the type of the key is a box, and instead operates on a secondary internal collection with the unboxed value used as key.

I wrote an implementation of a WeakMap supporting boxes even without having built-in support, for who is interested.

class WeakMapSupportingBoxes extends WeakMap {
  #weakmaps = Object.create(null);

  #unwrap(key) {
    let boxesCount = 0;
    while (typeof key === "box") {
      boxesCount++;
      key = key.unbox();
    }
    return { key, boxesCount };
  }

  has(k) {
    const { key, boxesCount } = this.#unwrap(k);
    return this.#weakmaps[boxesCount]?.has(key) ?? false;
  }

  get(k, v) {
    const { key, boxesCount } = this.#unwrap(k);
    return this.#weakmaps[boxesCount]?.get(key);
  }

  set(k, v) {
    const { key, boxesCount } = this.#unwrap(v);
    (this.#weakmaps[boxesCount] ??= new WeakMap).set(key, v);
    return this;
  }

  delete(k, v) {
    const { key, boxesCount } = this.#unwrap(v);
    return this.#weakmaps[boxesCount]?.delete(key) ?? false;
  }
}

I agree that Box(x) not being collectable when x is still reachable should not a blocking problem, since it's effectively the same as having two weakmaps both having x as a key.

littledan commented 3 years ago

In my mind, the interesting part of this idea is allowing Records and Tuples as WeakMap keys, if they include a Box with an object in it. This capability could generalize @bmeck's compound key proposal. A Record or Tuple would be considered live if all of the objects which it references via Boxes are live.

acutmore commented 3 years ago

In support of "the interesting part of this idea is allowing Records and Tuples as WeakMap keys, if they include a Box with an object in it.". A user-land WeakMap that supports this can be constructed (proof-of-concept). So it seems friendly to directly support it in the language level WeakMap, so existing code paths would work with compatible R&Ts.

bmeck commented 3 years ago

The one issue with this completely replacing compound keys is that you grant mutable access to the boxed components by using a Record/Tuple.

mhofman commented 3 years ago

@bmeck, could this be solved by placing a frozen "token" object inside the box, and using a side WeakMap table to associate the real object to the token? I know the ergonomics aren't good, but would it technically solve the use case?

bmeck commented 3 years ago

That just sounds like a normal side table approach to me. Which as I've stated is possible to make work but isn't trivial to code for once you do things like R->R mapping. Why not just use freshly minted Symbols which could then be re-used across tuples:

const lookupRefs = new WeakMap();
function lookupRefFor(v, table = lookupRefs) {
  // only change is here in Symbol vs Object
  if (!lookupRefs.has(v)) lookupRefs.set(v, Symbol());
  return lookupRefs.get(v);
}
function makeRecord(value_a, value_x_y_z, table = lookupRefs) {
  return #{
    a: lookupRefFor(value_a, table),
    x: {
      y: { z:  lookupRefFor(value_x_y_z, table) }
    }
  };
}

// the above kind of side table doesn't allow piecemeal replacement
// if the value of r.a === r.b they are not given unique boxes so in this scheme replacing r.a replaces r.b
// a slightly more flexible would give unique boxes per uniquely modifiable location. The problem can be shown:
let foo = {};
let bar = {};
let record1 = makeRecord(foo, foo);
let record2 = makeRecord(foo, bar);

// replacing the table entry of `foo` replaces all *references*
lookup.set(record1.a, 'I replaced record1.x.y.z and record2.a');

I don't think having an Object vs Symbol here is of significant difference personally and symbol has the nice effect of preventing some potential gotchas like private field attachment. I will state that I have done Object.freeze(Object.create(null)) in the past to round trip across iframes w/o leaking the Symbol.prototype but don't think it matters here. Either way, I do think it is possible but less trivial than people might make out without playing with the design pattern and especially w/ different forms of transforming involving the side tables.

mhofman commented 3 years ago

If we end up with Box as a primitive, I was thinking that Box(Object.freeze(Object.create(null))) would be a nice way to have something semantically equivalent to a Symbol() without requiring all Symbols as WeakMap keys, and the whole memory leak can of worms. As a new type, we can mandate that only R/T/B that contain a nested object be allowed in weak collections (as long as a built-in predicate exists to test this).

But I now realize I don't understand your use case. Do you have any pointers to a draft proposal or examples?

bmeck commented 3 years ago

@mhofman I don't have any simple/reduced cases in a proposal but I can make an example of what kind of things I do across Realms currently, notably the issue above is an example of how stating to just use a side table isn't really clear since there are 2 different but common side table use cases.

  1. creating identity based side tables for a privileged/unprivileged access to APIs
const privileged = {writeFile() {/*...*/}}
const unprivileged = {writeFile() {/*...*/}}
const privilegedRefs = new WeakMap();
const unprivilegedRefs = new WeakMap();

const logicalKey = Object.freeze(Object.create(null));
privilegedRefs.set(logicalKey, privileged);
unprivilegedRefs.set(logicalKey, unprivileged);

const record = #{
  api: logicalKey,
  deep: {nested: {api: logicalKey}}
};

runUnprivileged(record, privilegedRefs);
runPrivileged(record, unprivilegedRefs);

This kind of side table is done by replacing via identity, not location. It is good for avoiding accidental leakage of privileged APIs by guarding based upon identity of the capabilities. In the case above even though we only do 1 assignment to the table it covers both locations in the record.

  1. creating location based side tables for reflective programming
const onclick = (e) => {event.preventDefault();}
const instanceARefs = new WeakMap();
const instanceBRefs = new WeakMap();

const logicalKeyA = Object.freeze(Object.create(null));
const logicalKeyB = Object.freeze(Object.create(null));

instanceARefs.set(logicalKeyA, onclick);
instanceARefs.set(logicalKeyB, onclick);

instanceBRefs.set(logicalKeyA, onclick);
instanceBRefs.set(logicalKeyB, onclick);

const template = #{
  onclick: logicalKeyA,
  deep: {nested: {onclick: logicalKeyB}}
};

In this case we actually are using the R/T as a templating system where we aren't trying to guard against all kinds of access to powerful APIs but instead are only having the mutable parts of the R/T replaced. In particular, this means that we don't want to replace all locations containing a specific side table value but want to replace specific locations in the object graph even if other parts of the object graph point to the same side table value. This causes the lookup keys to be based upon location rather than identity.

This one gets particularly nasty if you nest R/T (some pseudocode here):

function makeElem({ children = #[],  onclick = () => {} } = {}) {
  // this could be done via several abstractions
  onclick = getLocationKeyFor(makeElem, 'onclick');
  return #{
    onclick,
    children
  };
}
makeElem(
  {
    onclick: () => { console.log(a); }
    children: #[ makeElem({ onclick: console.log(b); } ],
  }
)

So instead you have to nest side tables to deal w/ this as the intent is to emulate mutable instances with hardened parts of the object graph and now you basically are re-creating object semantics in user space. This can be useful since you can re-use the R/T without allocations for the hardened parts, but it isn't easy.

iwikal commented 2 years ago

Can this be closed now that https://github.com/tc39/proposal-record-tuple/pull/277 is merged?