Behavior of `Box(Box(x))`

mhofman commented 3 years ago

(can you make a Box of a primitive, such that Box(Box(x)) could ever happen?)

I'd argue that making a Box of a Box should not be allowed but simply return the existing Box.

Aka Box(Box(x)) === Box(x), the same way Object(x) === x when x is an Object.

Originally posted by @mhofman in https://github.com/tc39/proposal-record-tuple/issues/233#issuecomment-878810903

sjrd commented 3 years ago

It should be allowed to create a Box that actually contains a Box without flattening them. Otherwise, it becomes even more difficult to correctly write generic code. See the discussion starting at https://github.com/tc39/proposal-record-tuple/issues/206#issuecomment-866088128.

Assume that you can't create a Box(Box(x)) that is distinguishable from Box(x). If you write some generic code that manipulates elements of arbitrary types (like most containers do), and internally you want to put them in tuples and records. Currently you already have to either

put everything in Boxes (even primitives), which is easier to correctly write, but detrimental to performance, or
put only Objects in Boxes, which is more difficult to write correctly.

The former case becomes impossible because you can't put Boxes in Boxes anymore. So that leaves you with the more difficult option. But in that case what do you do when you receive Boxes:

you can't put them in a Box since you can't create a Box of Box anymore
but you also cannot not put them in a Box, since you won't be able to distinguish a Box(x) you received (and that must you must give back as an element of your container) from a Box(x) that you created to box the x that you received. When you see a Box(x) in your data structure, you won't be able to tell whether you need to expose the x or the Box(x) to the user of your data structure.

This means that you'll have to wrap the Boxes you receive in custom objects to be able to identify them, and then wrap those custom objects into another Box to put them in your tuples and records!

So it is very important to keep Box(Box(x)) distinct from Box(x). If you don't, writing generic code using tuples and records will become a nightmare.

nicolo-ribaudo commented 3 years ago

I agree with @sjrd: when you box something, Box(x).unbox() === x should always be true. Box should be a container for a single element without doing any special automatic transformations, similarly to how new WeakRef(x).deref() === x and [x][0] === x.

acutmore commented 3 years ago

similarly to how new WeakRef(x).deref() === x

I know you already know this @nicolo-ribaudo, but for completeness. That is not true for all x, WeakRef will throw for primitives.

While allowing any value in a Box is more ergenomic for code creating boxes for generic values. Complexity has been shifted to consumers of boxes. Consumers of boxes can't rely on Boxes always containing objects. For example, always being able to take a value from a box and store it in a WeakSet. Sometimes guarantees help because they reduce the number of cases code needs to handle. e.g. in somePromise.then(callback), the callback is guaranteed to be called at most one time and never before .then returns. So the author does not need to think about these cases.

sjrd commented 3 years ago

Sometimes guarantees help because they reduce the number of cases code needs to handle.

Yes, sometimes they do. And sometimes the additional guarantees at one point do not outweigh the cost of constraints at another point.

Consumers of boxes can't rely on Boxes always containing objects. For example, always being able to take a value from a box and store it in a WeakSet.

Do you have any realistic example of that scenario? WeakSets restrict their elements to being Objects for a very good, objective reason: they cannot provide their functionality for primitives because primitives do not have an identity. The constraint here is necessary to provide the essential functionality of WeakSet. The same cannot be said about Box. There is nothing in the essential functionality of a Box that requires it to be restricted to objects. Therefore, imposing those constraints is just additional burden.

mhofman commented 3 years ago

The problem is that Box doesn't just exist on its own, but will be used in conjunction with Record and Tuple.

The question at the core is: what does a box represent? We'll all agree that the motivation for box is to allow mutable data inside immutable structures. So as with most problems in computer science, we solve this by introducing a level of indirection.

The main point to realize is that a purely immutable structure is semantically different from an immutable structure with mutable objects contained within it. The former is identity-less and forgeable, it can be reconstructed by the program. The latter is unforgeable without access to the mutable objects, and thus combines the identity of the objects it contains.

Back to boxes, on its own, a idea of a container that creates one level of indirection could be applied to any value, but when applied to the problem at hand, box has a specific purpose: a level of indirection to allow mutable objects with identity in immutable structure. It also conveys a meaning: if a record/tuple has a box, it carries identity.

When a program handles a record or tuple, it may need to know what kind it is: forgeable or carrying identity. That program needs a way to discriminate between the 2. Asking a question "does this R/T directly contain a box" is a much simpler to reason about than "does this record/tuple recursively contains an object".

As for specific cases where this matters:

Iterating over a record/tuple to find the mutable exits shouldn't require recursively handling boxes
Membranes work by creating proxies for objects. If an object is contained within a record, it needs to be able to find them, and add the identity bearing record itself to a WeakMap. You can only add identity bearing records to weak collections in order to avoid memory leaks.

What I'd like to see is an example of program which is made a lot more complex by not allowing boxing primitives.

Pauan commented 3 years ago

@mhofman I would like to give a counter argument. There are various functional languages that have a split between mutable / immutable data, and they handle mutable data with something similar to Box:

Clojure uses ref
OCaml uses ref
SML uses ref
Haskell uses numerous different systems, including STRef
Rust uses various different types: Cell, RefCell, Mutex, and RwLock

Rust also has Box which is used solely for indirection, not mutability. It is the closest analogue to JS Box, and it supports Box<Box<T>>

All of these languages allow for nested references (i.e. Box(Box(x))). This simplifies the type system, it makes generic code possible, it makes the language more consistent, and it is sometimes practically useful.

Speaking more broadly, having a pointer which points to another pointer is a very common technique throughout C, C++, and Rust. There is nothing wrong with nested indirection, it is useful.

When a program handles a record or tuple, it may need to know what kind it is: forgeable or carrying identity. That program needs a way to discriminate between the 2. Asking a question "does this R/T directly contain a box" is a much simpler to reason about than "does this record/tuple recursively contains an object".

That sort of algorithm already needs to be recursive (because tuples / records can be infinitely nested), so I don't see how it is an extra burden to recurse into the Box, I think it should be about the same complexity.

Membranes work by creating proxies for objects. If an object is contained within a record, it needs to be able to find them, and add the identity bearing record itself to a WeakMap. You can only add identity bearing records to weak collections in order to avoid memory leaks.

Membranes already need to special-case primitives (because they can't be stored in a WeakMap), so I don't think it would add much complexity for them to special-case Box, but I'm open to being proven wrong with example code.

I also think that the WeakMap argument is... well, rather weak. It is true that WeakMap doesn't allow for primitives (for good reason), but there are a lot of uses of Box that don't involve WeakMap at all, so why should the limitations of WeakMap also apply to Box?

WeakMap disallows primitives for good reasons, but those reasons don't apply to Box (in fact they don't apply anywhere in JS outside of WeakMap, because WeakMap is special and unique).

I don't think that WeakMap is so all-encompassing and important that it should affect the decisions for unrelated APIs (which are useful outside of WeakMap).

What I'd like to see is an example of program which is made a lot more complex by not allowing boxing primitives.

All generic code will become more complex. All code which relies on a specific nesting depth becomes more complex. Even something as simple as let b = Box(x); ... b.unbox() becomes more complex, since you can no longer rely on a 1-to-1 relationship between box/unbox.

I think that's far worse than making WeakMap usage a little bit more complex, since WeakMap is niche and not used very often, and WeakMap is already inherently complex because it disallows primitives, that's just the cost you must accept for using WeakMap.

sjrd commented 3 years ago

What I'd like to see is an example of program which is made a lot more complex by not allowing boxing primitives.

There is already an example here: https://github.com/tc39/proposal-record-tuple/issues/206#issuecomment-866088128 The argument over there is to make boxing implicit, and that requiring it to be explicit makes writing this kind of code generically much more complicated. But it becomes even more complicated if you don't allow primitives (including Boxes) inside Boxes.

With implicit boxes (ideal):

class SnapshottableStack {
  constructor() {
    this._stack = null;
  }

  push(x) {
    this._stack = #[x, this._stack]; // x directly in the tuple
  }

  pop() {
    if (this._stack === null)
      throw new Error("empty stack");
    const result = this._stack[0]; // directly get the user value from the tuple
    this._stack = this._stack[1];
    return result;
  }

  snapshot() {
    return #{snapshot: this._stack};
  }

  restore(snapshot) {
    this._stack = snapshot.snapshot;
  }
}

With explicit Boxes, but at least primitives allowed (not ideal, but generic code can actually stay generic, if not optimal from a performance point of view):

class SnapshottableStack {
  ...

  push(x) {
    this._stack = #[Box(x), this._stack]; // Box in case it's an object
  }

  pop() {
    if (this._stack === null)
      throw new Error("empty stack");
    const result = this._stack[0].unbox(); // unbox back for the user value in all cases
    this._stack = this._stack[1];
    return result;
  }

  ...
}

When not allowing primitives (it becomes being quite bad, and this code is wrong, see below):

class SnapshottableStack {
  ...

  push(x) {
    // Box *only* if it is an object
    const box = typeof x === "object" || typeof x === "function" ? Box(x) : x;
    this._stack = #[Box(x), this._stack];
  }

  pop() {
    if (this._stack === null)
      throw new Error("empty stack");
    const resultBox = this._stack[0];
    this._stack = this._stack[1];
    // unbox in case it is a Box, to get back the user value
    const result = typeof resultBox === "box" ? resultBox.unbox() : resultBox;
    return result;
  }

  ...
}

But then the user of the above cannot use Boxes in that data structures themselves. So if you want parity, you have to make it so complicated that, at the end of the day, you've got to use an Object inside your Box to keep being completely generic:

class SnapshottableStack {
  ...

  push(x) {
    this._stack = #[Box({v: x}), this._stack]; // Box an Object with x inside
  }

  pop() {
    if (this._stack === null)
      throw new Error("empty stack");
    const result = this._stack[0].unbox().v; // unbox and extract the user value
    this._stack = this._stack[1];
    return result;
  }

  ...
}

which means you are actually completely surrendering the benefits of the immutable tuples.

And then, seriously, what's the point of the whole thing?

nicolo-ribaudo commented 3 years ago

@Pauan Those refs are not the equivalent of Box. For example, in OCaml (disclaimer - I used OCaml many years ago and I don't remember much about it; I had to read the docs to write this):

let x = ref 1;
let y = ref 1;
let eq = x == y; (* false *)
let () = x := 3; (* refs are mutable *)

let x = Box(1);
let y = Box(1);
let eq = x == y; // true
// cannot change the content of a box

They are more like plain JS objects:

let x = { ref: 1 };
let y = { ref: 1 };
let eq = x == y; // false
x.ref = 3;

Similarly, ref in Clojure and SML, and STRef in Haskell, are mutable. All those features are covered by https://github.com/rbuckton/proposal-refs, which is a completely different thing from Box.

nicolo-ribaudo commented 3 years ago

@sjrd SnapshottableStack could be written like this, and still have all the benefits of immutable types:

function toImmutable(value) {
  const isObject = Object(value) === value;
  return #{ isObject, value: isObject ? Box(value) : value };
}
function fromImmutable({ isObject, value }) {
  return isObject ? Box.unbox(value) : value;
}

class SnapshottableStack {
  ...

  push(x) {
    this._stack = #[toImmutable(x), this._stack];
  }

  pop() {
    if (this._stack === null)
      throw new Error("empty stack");
    const result = fromImmutable(this._stack[0]);
    this._stack = this._stack[1];
    return result;
  }

  ...
}

Btw, being able to rewrite this stack example that you posted a few months ago was what convinced me that primitives in boxes are not strictly necessary.

nicolo-ribaudo commented 3 years ago

@acutmore

similarly to how new WeakRef(x).deref() === x

I know you already know this @nicolo-ribaudo, but for completeness. That is not true for all x, WeakRef will throw for primitives.

But it's not false either :stuck_out_tongue:

To rephrase, Box(x).unbox() === x must always be true when it gives a result.

Pauan commented 3 years ago

@nicolo-ribaudo I was responding to @mhofman 's claim about a mutable container inside of an immutable object. Of course they aren't the same as JS Box, but they are rather close. And especially with Rust... Box::new(1) == Box::new(1) is true. Rust Box is very close to JS Box, and it allows nesting.

nicolo-ribaudo commented 3 years ago

Slightly off topic, but what is Box used for in Rust? (I now almost nothing about Rust)

Pauan commented 3 years ago

Box is used for indirection, it heap allocates the value and returns a pointer to it. You can read more about it here:

https://doc.rust-lang.org/stable/book/ch15-01-box.html

https://doc.rust-lang.org/std/boxed/index.html

In Rust all values are stack allocated by default, so you have to manually use Box in order to heap allocate something.

And yes, in Rust there are use cases for heap-allocating primitives like integers, and also use cases for nested Box. It's not common, but it does happen occasionally.

nicolo-ribaudo commented 3 years ago

I have moved the discussion about primitives in boxes to https://github.com/tc39/proposal-record-tuple/issues/258, since it was happening in multiple parallel threads.

Please let's keep this issue only about:

If boxes can contain primitives, what should Box(Box(x)) === Box(x)?
If boxes cannot contain primitives, should Box(Box(x)) throw or return Box(x)?

mhofman commented 3 years ago

If boxes can contain primitives, what should Box(Box(x)) === Box(x)?

If boxes can, then Box(Box(x)) !== Box(x), I agree and see no reason to make another box be a weird value.

If boxes cannot contain primitives, should Box(Box(x)) throw or return Box(x)?

IMO it should throw like other primitives.

ljharb commented 3 years ago

I agree - either it should be a generic container (whatever you put into it, you can take out, no exceptions) or it should throw on all primitives, which includes box itself.

Jack-Works commented 3 years ago

I agree - either it should be a generic container (whatever you put into it, you can take out, no exceptions) or it should throw on all primitives, which includes box itself.

+1. And I prefer the generic container.

rbuckton commented 2 years ago

@mhofman I would like to give a counter argument. There are various functional languages that have a split between mutable / immutable data, and they handle mutable data with something similar to Box:

Clojure uses ref

OCaml uses ref

SML uses ref

Haskell uses numerous different systems, including STRef

Rust uses various different types: Cell, RefCell, Mutex, and RwLock Rust also has Box which is used solely for indirection, not mutability. It is the closest analogue to JS Box, and it supports Box<Box<T>>

I'd like to note that I have a not-yet-presented proposal for a ref declaration/expression that I think shares some overlap with Box: https://github.com/rbuckton/proposal-refs. I've been meaning to discuss with the Record and Tuple champions about my ref proposal and whether there is potential overlap or crosscutting concerns. I'm planning to propose ref once the shared structs proposal is mature enough, as there is some potential synergy between shared structs, Atomics, and ref that adds additional motivating use cases to the ref proposal.

mhofman commented 2 years ago

The current thinking is to find a new name to make it obvious this is not a generic container (presented as ObjectPlaceholder at the last meeting), and to only allow objects in it. At that point a Box(Box(x)) is moot as primitives wouldn't be allowed.

I also believe there are significant differences with your ref, namely values in our case would be primitives, that there is no syntactic ability to create or get the content of an ObjectPlaceholder value, and that both operations require access to the ObjectPlaceholder global. This is to allow virtualization and keep enforcing expectations by some programs that primitives alone do not grant access to some authority (aka primitives don't contain hidden authority).

See https://github.com/tc39/proposal-record-tuple/pull/257 for some of the background.

ljharb commented 2 years ago

Note that I still think it's hugely valuable to make it a generic container, and am not convinced by arguments that it should be restricted to only meeting the motivating use case.

mhofman commented 2 years ago

I've demonstrated that generic containers can be built in userland, and if the goal is to have them as primitives, they can be combined with ObjectPlaceholder for that: https://es.discourse.group/t/tagged-records-and-variants/1058/9

ljharb commented 2 years ago

The primary value is the coordination point - which only exists if it's in the language. A generic container has been buildable in userland for a long time - it's an array :-) - but that has many usage models, while a standard single container builtin would only have the one (to be a container)

mhofman commented 2 years ago

But that's the thing. A global generic Box container would have many different usages, and would have meaning that differs depending on the application as well. If you want to ascribe meaning to something, that should be expressed explicitly. Plain objects or arrays don't allow you to do that, and neither would a new generic container. However that's what classes provide: a kind and recognizable instances of the kind.

The wrapper registry I linked to uses this concept of kind and recognizable values of the kind that classes have, and mixes it with the stable wrapping pattern that ObjectPlaceholder has, preserving wrapper identity for a given kind. It also has similar semantics that the wrapper can only be opened by having access to the kind registry.

These wrappers are objects, and with the help of ObjectPlaceholder can be used where primitives are by building tagged records, to identify the kind of wrapper that is in the placeholder.

nicolo-ribaudo commented 2 years ago

We removed boxes from the proposal, since it was stalled because of them. I'm closing this issue, but we'll keep track of it if we'll bring them up again as a follow on proposal.

tc39 / proposal-record-tuple

Behavior of `Box(Box(x))` #238