tc39 / proposal-record-tuple

ECMAScript proposal for the Record and Tuple value types. | Stage 2: it will change!
https://tc39.es/proposal-record-tuple/
2.5k stars 62 forks source link

Box: choosing between manual boxing and automatic boxing #206

Closed rricard closed 2 years ago

rricard commented 4 years ago

This issue is a direct follow-on of #200 and expands the following open question from the original issue:

Should you be required to "unbox" a Box inside a Record or Tuple, or should it be "automatic" in some way?

It also, for the sake of simplifying things, will not require you to create a Box, it will be created for you.

Essentially, we need to choose between two options:

Example edited by @nicolo-ribaudo for clarity - https://github.com/tc39/proposal-record-tuple/issues/206#issuecomment-724379577

I have the intuition that forcing manual and explicit boxing helps with making sure that nothing gets mutated unless we see the use of .unbox(). Those guarantees are interesting in many cases, especially when dealing with business logic in large codebases. This is something that we found could greatly reduce bugs at the boudary between immutable and mutable domains and that we think could be beneficial for the majority of users of the feature.

However, manual boxing is verbose and is cumbersome when used often. Automatic boxing would be terser.

I think that the ergonomic tradeoff of manual boxing is worth it to reduce potential bugs but I'm interested in discussing what you think about it!

ljharb commented 4 years ago

I'm confused; why would Box.contains(record) be true when record isn't a box, and fn is the Boxed thing? What is Box.contains supposed to do?

nicolo-ribaudo commented 4 years ago

The idea behind Box.contains is that it should check if the parameter is a record containing mutable state (draft spec: https://github.com/tc39/proposal-record-tuple/pull/197/files#diff-9845cd5af38a2d508fce723eb3df23019f4e2578b2a5b75e96e56db532b883f6R676). Maybe it the name was Record.containsBoxes(record) the example would be clearer?


As a downside I see in automatic boxes, is that #{ server: { port: 1234 } } === #{ server: { port: 1234 } } is false because the two { port: 1234 } objects are different, and while reading the code it's easy to miss that they aren't records.

ljharb commented 4 years ago

Yes, Record.containsBoxes makes much more sense to me as a name, thanks.

rricard commented 4 years ago

We actually discussed Record.containsBoxes as an alternative naming with @littledan. I am going to change it in the issue.

rricard commented 4 years ago

As a downside I see in automatic boxes, is that #{ server: { port: 1234 } } === #{ server: { port: 1234 } } is false because the two { port: 1234 } objects are different, and while reading the code it's easy to miss that they aren't records.

I can concur this is a very problematic issue and has been noted by many people I talked to about this issue.

ljharb commented 4 years ago

I thought that syntactically, #{ server: { port: 1234 } } would be an error? Even with automatic boxing I think this should be the case. With automatic boxing I'd expect something more like const obj = { port: 1234 }; #{ server: obj } to be necessary.

nicolo-ribaudo commented 4 years ago

Syntactically record properties can be any expression, currently (with explicit boxing) the only restriction is that they evaluate to a primitive.

ljharb commented 4 years ago

That seems like a footgun tho with explicit boxing - I doubt anyone would provide an object literal, and expect it to be coerced to a primitive.

littledan commented 4 years ago

I agree that this case is a significant ergonomic risk. We'd be depending on linters and type systems, rather than early errors, to catch these cases, as it would be too weird to allow arbitrary expressions but just not object literals in this space.

Note that, in a previous version of this proposal, we actually did interpret this as nested Records and Tuples, so I see how this could be confusing. At the same time, I'm not really persuaded by the expectation that things would be coerced to primitives: we don't really have that kind of coercion in the language in general. It will always be possible to claim various expectations, for any proposal, no matter what semantics we decide.

brad4d commented 3 years ago

I'm confused by the examples in the description.

  • Manual/explicit
    const record = { fn: Box(fn) };
    Box.contains(record); // => true
    record.fn(); // => TypeError
    record.fn.unbox()();
    typeof record.fn // => "box"
    typeof record.fn.unbox() // => "function"
    const config = #{ server: { port: 1234 } }; // => TypeError
  • Automatic/implicit
    const record = { fn };
    Record.containsBoxes(record); // => true
    record.fn();
    typeof record.fn // => "function"
    const config = #{ server: { port: 1234 } }; // => Ok
brad4d commented 3 years ago

If we go with explicit boxing, then we have the nice situation that you can tell from looking at a reference to the contents of a record when you cross the boundary from immutable to mutable.

// explicit
// all things referred to are also immutable things until we hit the call to `unbox()`
record.foo[3].x.y.unbox().z; // z is probably mutable 

// implicit
// without additional information the reader cannot tell which of these
// addressed values are immutable
record.foo[3].x.y.z;
ckknight commented 3 years ago

I dramatically prefer explicit over implicit boxing, as it will be much harder to shoot oneself in the foot with explicitness. It seems like it'd be far too easy to make unintentional mistakes unless protected by a suite of tools like linters and type checkers, which we can't expect the average JavaScript developer who is going to be using Record/Tuple/Box to necessarily have.

ghost commented 3 years ago

If you were to follow the implicit route, and deny explicit boxing, one would not be able to use a subclass to box values.

Ex: a paranoid script writer always wants to be able to unbox their boxes,

// box.mjs
const { deref } = Box.prototype;

export default class Box {
    deref = deref;
}

// main.mjs
import Box from "./box.mjs";

const tuple = #[
    new Box([ "string in an array in a box in a tuple" ])
];

tuple[0].deref(); // guaranteed to be the array
rricard commented 3 years ago

If you were to follow the implicit route, and deny explicit boxing, one would not be able to use a subclass to box values.

You likely will not be able to as I described in #212.

robbiespeed commented 3 years ago

I suggest explicit boxing, and implicit unboxing.

Explicit boxing has the benefit of reducing confusion of #{ foo: { bar: 1 } } where one might expect bar to be a record.

Explicit unboxing on the other hand reduces ergonomics, since the common use cases would always require unboxing.

In the case that a box would be desired, you could wrap the implicitly unboxed value in a box, and the underlying box identity would be the same.

Implicit unboxing would work well combined with #() explicit boxing syntax from #215

dead-claudia commented 3 years ago

I'm not sure how much the virtual DOM would gain from an explicit boxing model...

Coming as a framework implementor who's actually worked on a high-performance framework, I see only extra overhead. I see four options

  1. The simplest way is box the attributes wholesale. Then, we end up losing basically a large chunk of the gains we'd get out of using records for those, and when it comes to children and the raw vnode structure itself, we make assumptions about the vnode object itself and we necessarily can't just blindly diff the children. Also, the single box will likely add some mild memory overhead.
  2. Suppose we do the boxing ourselves. Now we're faced with a significant bit of overhead normalizing all the attributes and creating all the boxes ourselves, and that only increases allocation pressure and adds a bunch of hard-to-predict branching in a critical execution path as we're creating a box for every non-object we find.
  3. The user passes in a record, but then they have to box all their functions, and that gets boilerplatey in a hurry, making for a poor user experience. There's also the above allocation pressure, though branch prediction isn't a major concern.
  4. Suppose we provide a compiler step to do the above. While that usability negative is largely gone, most Mithril users prefer to avoid the compile step (and thus wouldn't benefit), and for those who would, it still doesn't avoid the allocation pressure concern.

Note: I would personally love some benchmarks here proving my perf concerns invalid for 3 and 4. But as it stands, my experience with engines suggests there's bound to be some level of overhead.

If we get the above Box.equalsIgnoreBoxes method, we could at least salvage the above overhead some by instead of modeling vnodes directly, modeling them indirectly as a {root, boxes: Box<Map<Path, object>>}, maintaining a path->value map of what would normally be boxes and creating references to them as well as adding a flag of "does this reference any objects in the boxes map" to each virtual child in the tree, and then reconstructing them where applicable. Then, we could genuinely avoid work if Box.equalsIgnoreBoxes(vnode.root, old.root) returns true, and just update event listeners and such rather than actually patching the DOM. We could do that for the children and for the attributes, and we could do that separately. Though keep in mind, we'd still have to diff them and patch them, so it'd be speeding up the happy path at a significant cost to the slow path of updating.

It's also worth noting we could reap similar gains just by having that kind of method, even without explicit boxing.

Also, how does this help implementations in a way that a single bit on the object itself (that they'd likely do even if you all do move forward with boxing) doesn't? I get the interest in managing mutation, but the virtual DOM example won't reap the benefits you think it will with boxes.

dead-claudia commented 3 years ago

@robbiespeed

Explicit boxing has the benefit of reducing confusion of #{ foo: { bar: 1 } } where one might expect bar to be a record.

That could be solved by, say, disallowing { and [ from starting any record/tuple value (thus forcing parentheses) or by requiring an explicit token or keyword (that could apply to variables as well). One does not necessarily need to incorporate boxes to mitigate that risk of confusion.

robbiespeed commented 3 years ago

That could be solved by, say, disallowing { and [ from starting any record/tuple value (thus forcing parentheses) or by requiring an explicit token or keyword (that could apply to variables as well). One does not necessarily need to incorporate boxes to mitigate that risk of confusion.

While true I do find it a little odd that both #{ foo: ({ bar: 1 }) } and #{ foo: barObj } would be allowed, but #{ foo: { bar: 1 } } would not. I can't think of anywhere else in the language that requires wrapping object expressions in parenthesis, but allows references without.

It's somewhat similar to the #() syntax, but with a special case for object references.

ljharb commented 3 years ago

@robbiespeed concise arrow function bodies:

() => ({}) // returns object
() => x // returns x
() => {} // returns undefined
robbiespeed commented 3 years ago

@ljharb Good catch, forgot about that case even though I use it regularly. I do think that's also a case that could have benefited from an explicit syntax like () =>> x.

One difference between the 2 would be array expressions since () => [] does not require parentheses.

sjrd commented 3 years ago

I would like to make a few arguments in favor of automatic boxing/unboxing, based on very concrete examples.

Disclaimer: I am the lead author and maintainer of Scala.js, the Scala to JavaScript compiler. This clearly has an impact on my point of view.

tl;dr Explicit boxing is bad for writing generic code, as well as to write statically typed code using Tuples and Records. Implicit boxes solve these problems, and have additional nice benefits in terms of performance and spec simplicity.

Writing generic code

Let's say I want to write a mutable stack data structure with O(1) snapshots and restores. A straightforward way to implement this structure is using an immutable linked list internally, using tuples:

class SnapshottableStack {
  constructor() {
    this._stack = null;
  }

  push(x) {
    this._stack = #[x, this._stack];
  }

  pop() {
    if (this._stack === null)
      throw new Error("empty stack");
    const result = this._stack[0];
    this._stack = this._stack[1];
    return result;
  }

  snapshot() {
    return #{snapshot: this._stack};
  }

  restore(snapshot) {
    this._stack = snapshot.snapshot;
  }
}

Why this example? Because it is the useful data structure requiring the least amount of code while allowing me to illustrate my point.

Like most data structures, this class is generic in its element type. It does not care what type of elements will be put in the stack. As a user, I can use a SnapshottableStack to store numbers, strings, tuples, records, etc.

Ah but wait! If I try to store Objects in a SnapshottableStack, things blow up! Why? Because it internally stores my Objects in tuples and records. As a user, I should not have to care about this.

With explicit boxing, there are two bad solutions:

With implicit boxing, everything works without any special code from the author or the user of the SnapshottableStack abstraction.

This is just one example, but the same problem and solution generalizes to any data structure or function that is generic in the elements it manipulates, and wants to use tuples and records under the hood. The requirement for explicit boxes breaks modularity and genericity, whereas implicit boxes preserves those important properties.

Typeability

The problem of genericity transfers to statically typed languages in terms of typeability. This applies for example to TypeScript or Scala.js.

I will start with the problem of typing Tuples and Records themselves in Scala.js. Scala.js for example has few "primitives" in terms of interoperability with JavaSript. Most types for interoperability with JavaScript are user-defined (sometimes in the standard library), even for JavaScript primitive types. For example, the language prescribes nothing about bigints. Instead, there is a definition for js.BigInt in the standard library. Similarly, there is a user-land definition for js.Array[A].

The way we would define JavaScript Tuples in Scala.js would be similar. It should look something like

@js.native
trait Tuple2[+A, +B] extends Tuple {
  @JSName("0") val _0: A = js.native
  @JSName("1") val _1: B = js.native
}

object Tuple2 {
  def apply[A, B](a: A, b: B): Tuple2[A, B] =
    js.Dynamic.global.Tuple.from(js.Array(a, b)).asInstanceOf[Tuple2[A, B]]
}

However, with explicit boxing, that would be incorrect. It erroneously allows to instantiate the type parameters A and B to Object types, for example with Tuple2[String, Date]. Such a type would be unsound, and crash at run-time. Instead, we need very complicated type bounds on the type parameters:

trait Tuple2[+A <: Double | String | Symbol | Boolean | ... | Tuple | Record | Box,
    +B <: Double | String | Symbol | Boolean | ... | Tuple | Record | Box]
    extends Tuple {
  ...
}

In practice, we would have a pre-defined type alias for this monstrous union type:

type TupleOrRecordElement = Double | String | Symbol | Boolean | ... | Tuple | Record | Box

trait Tuple2[+A <: TupleOrRecordElement, +B <: TupleOrRecordElement]
    extends Tuple {
  ...
}

which makes it somewhat palatable to write, but is still going to have an impact on compilation times and on the quality of error messages.

You could say: "That's a Scala.js-specific issue. It is not our problem. Tough luck for you folks." But the same problem applies for any statically typed language that targets JavaScript, including TypeScript.

OK, TypeScript is likely to have a primitive type representation of JS Tuples and Records, probably using the syntax #[number, string]. It would then easily be able to reject #[number, Date], for example.

While that is true, but it does not compose. If I want to implement SnapshottableStack in TypeScript, it will have to look like

type TupleOrRecordElement = number | string | symbol | boolean | ... | Tuple | Record | Box;

type StackState<E extends TupleOrRecordElement> = #[E, StackState[E]] | null;

class SnapshottableStack<E extends TupleOrRecordElement> {
  _stack: StackState<E>;
  ...
}

Every time we want to write structures or functions that are generic in their data, and want to store them in Tuples or Records, we will have to carry around those awkward bounds. They will have an impact on compile times, and on the quality of error messages.

With implicit boxing and unboxing, none of this would be necessary. We would use any generic, unconstrained type parameter in Tuple and Record types. In Scala.js, in TypeScript, or in any other statically typed language that interacts with JavaScript.

Conclusion about the above

The two above issues are taken from very concrete use cases. In general, they are symptoms of a larger problem about composability and modularity of Tuples and Records, as long as they require explicit Boxes. Using implicit boxing and unboxing would solve those issues at the core.

Additional nice consequences of implicit boxing

Once we have implicit boxing, notice that there are only two places where JavaScript developers can observe the concept of Boxes:

There would be no reason to do the former anymore, so that leaves only Record.containsBoxes. We can rename that one into Record.isDeeplyImmutable without changing its semantics.

That entirely gets rid of Box as a concept that JavaScript developers need to learn. In fact, the concept can then be eradicated from the spec, making the spec simpler.

There are also advantages in terms of performance. Once Boxes are not visible at the user-level anymore, there is no need for engines to actually allocate Boxes at all. That is irrespective of the spec changes (the spec could expose Boxes while the engines would not use them in their implementation). The engines can still implement Record.isDeeplyImmutable efficiently, by using a single bit in Tuples and Records. This is not affected by whether there is a real concept of Boxes or not.

All right, that's all I have for today. Thanks for reading.

ljharb commented 3 years ago

Isn’t another option, that the author of SnapshottableStack could wrap only objects in boxes explicitly, since that’s an inherent limitation of the data structure it’s using internally? Meaning, the user never has to care, and the author can change “boxing + tuples” to another approach all they like.

sjrd commented 3 years ago

Of course, but that means significantly complicating the code, potentially introducing more bugs, perhaps more performance overhead, and making it harder to type correctly (if you're using a statically typed language). At that point I would opt for the always-box approach.

dead-claudia commented 3 years ago

@sjrd Could the Scala.js compiler not simply special-case that internally (at least for a few of the common cases) to mitigate the perf cliff?

Edit: Just to be clear, I'm not exactly a fan of explicit boxing. I strongly prefer implicit. This is more just a little bit of devil's advocate here.

sjrd commented 3 years ago

I'm not sure what you're suggesting? Do you mean let me write SnapshottableStack[A] using the straightforward approach, and let the compiler insert box/unbox operations automatically when A is not a primitive? If yes, then only languages with reified generic types (GHCJS, perhaps?) can do that. Languages with erased generic types (like TypeScript, Scala.js, other JVM-style languages, PureScript, etc.) cannot do that. Indeed, we compile SnapshottableStack[A] once, using the same generated code for all possible As.

So the only way would be to generate code that would perform the same run-time tests than would a hand-written solution would. That means it does not solve the performance cliffs issue. It may superficially solve the ease-of-writing issue, although we would have to ensure that such magic does not cause issues with the rest of the semantics of the language (in general, the more magic you do, the more chances of semantic unsoundness you introduce; Scala.js is very low on magic for that reason).

rickbutton commented 3 years ago

@sjrd I appreciate this detailed response very much. Let me put a few of my thoughts into words:

Regarding the first example, I agree with @ljharb 's point here, in that the natural solution to me is to have the SnapshottableStack explicitly handle objects via boxing them. I recognize that this adds additional complexity to the implementation of said data structure, but I believe that this requirement makes the contract between the user and the DS more clear, not less. A core tenet of the design of Record and Tuple is to provide a truly immutable data structure; allowing implicit boxing betrays that tenet. For example:

// with implicit boxing
const stack = new SnapshottableStack();
const obj = { a: 1 };
stack.push(obj);
const snapshot = stack.snapshot();
obj.a++;

console.log(snapshot[0].a === obj.a); // true

In that example, is the value returned from snapshot() truly a snapshot of the stack if it reflects mutations of the stack made after the snapshot? An external reader might incorrectly come to the conclusion that the snapshot is immutable, when it is not. When boxing is explicit, however, both the implementer of the data structure and the user of it must both understand the contract of immutable data (the DS must wrap in Box, the user must unwrap the Box). This makes the relationship between the immutable and mutable structures explicit, the last line of code instead becomes:

console.log(snapshot[0].unbox().a === obj.a);

The call to unbox() makes explicit the fact that there is a reference to an object here that requires special consideration (when thinking about the lifecycle of a value).

On the second point about type systems, I believe TypeScript has done some work in the recent past in regard to making errors more human readable, and I don't see any reason why a type system cannot provide a builtin type that negates the need for duplication of that type. I can't speak to the performance implications of the additional typing overhead, though, although from my own experience with TypeScript a number | boolean | .... | Box type doesn't seem outrageously expensive.

sjrd commented 3 years ago

A core tenent of the design of Record and Tuple is to provide a truly immutable data structure; allowing implicit boxing betrays that tenent. For example: [...] In that example, is the value returned from snapshot() truly a snapshot of the stack if it reflects mutations of the stack made after the snapshot? An external reader might incorrectly come to the conclusion that the snapshot is immutable, when it is not.

The snapshot is still immutable. It is not deeply immutable. The values referenced by the snapshot are internally mutable, but the snapshot of the stack itself is immutable. When you restore a stack using that snapshot, the stack will be 100% equivalent to the one you had before. It will reference the exact same list of objects. Those objects may have internally changed in the meantime, but that is no different than if you had kept the stack as is since you created the snapshot. So yes, the snapshot is immutable.

I don't want to argue too much about this "core tenent", but since you mentioned it, I will say a few words about it. I understand that the proposal is presented as "fundamentally" about that, arguing that not-deeply immutable structure will lead to bugs in large software. I don't see any evidence of that, though (and I've read a lot about that in this repo before starting to engage here). To me it is a non-falsifiable hypothesis, so I cannot argue against it. The only thing I can do is show that there are real, concrete examples where the requirement for deep immutability leads to bugs or hard to write and read software.

Perhaps I should also mention that there is a lot of precedent in other languages to have non-deeply immutable data structures. This is true in statically typed languages like Scala (all default collections, in addition to tuples, are not-deeply immutable), but also in dynamically typed languages like Python (tuples) and Clojure(Script) (here as well, most default collections). The latter to me is the most compelling evidence that non-deeply immutable data structures are not a problem wrt. large codebases. Clojure is an excellent language for data manipulation, including in large projects; it is perhaps the positive comment I read most about Clojure, and about Clojure more than about about languages (I am not a Clojure developer). By contrast, I don't know any precedent in a major language for deeply immutable data structures.

rickbutton commented 3 years ago

The snapshot is still immutable. It is not deeply immutable. The values referenced by the snapshot are internally mutable, but the snapshot of the stack itself is immutable. When you restore a stack using that snapshot, the stack will be 100% equivalent to the one you had before. It will reference the exact same list of objects. Those objects may have internally changed in the meantime, but that is no different than if you had kept the stack as is since you created the snapshot. So yes, the snapshot is immutable.

IMO the benefit of boxes being explicit is that this specific characteristic is made explicit in the code.

The only thing I can do is show that there are real, concrete examples where the requirement for deep immutability leads to bugs or hard to write and read software.

If we add boxes to your previous example, it looks like this:

class SnapshottableStack {
  constructor() {
    this._stack = null;
  }

  push(x) {
    const boxed = typeof x === "object" && x !== null ?
        Box(x) : x;
    this._stack = #[boxed, this._stack];
  }

  pop() {
    if (this._stack === null)
      throw new Error("empty stack");
    const result = this._stack[0];
    this._stack = this._stack[1];
    return Box.isBox(result) ? result.unbox() : result;
  }

  snapshot() {
    return #{snapshot: this._stack};
  }

  restore(snapshot) {
    this._stack = snapshot.snapshot;
  }
}

Based on that example, IMO the addition of boxes here leads to it being more readable, not less. While it does have an extra statement and an extra expression over the first one, it's contract with regard to mutable data is now explicit.

By contrast, I don't know any precedent in a major language for deeply immutable data structures.

To be fair I don't use these daily, but to my knowledge at least Swift, Rust, and OCaml (off the top of my head) all have deeply immutable structs of some kind, where an escape to the mutable requires a specific keyword/etc.

sjrd commented 3 years ago

Based on that example, IMO the addition of boxes here leads to it being more readable, not less. While it does have an extra statement and an extra expression over the first one, it's contract with regard to mutable data is now explicit.

I guess we'll just have to disagree, here. To me this is less readable. There are code paths and conditions in there that serve only to work around the restrictions on tuples and records, and that are not related to the logic I am interested in.

Also your code is broken for the case where the user wants to put Boxes in SnapshottableStack, which kind of proves my point that it is easier to introduce bugs.

To be fair I don't use these daily, but to my knowledge at least Swift, Rust, and OCaml (off the top of my head) all have deeply immutable structs of some kind, where an escape to the mutable requires a specific keyword/etc.

OCaml definitely lets you put mutable objects (ref is basically the canonical mutable data type) in its immutable data types:

let print_first xs =
  match xs with
  | [] -> Printf.printf "none\n"
  | h :: t -> Printf.printf "%d\n" !h;;

let r = ref 0;;
Printf.printf "%d\n" !r;;
let xs = [r];;
print_first xs;;
r := 5;;
Printf.printf "%d\n" !r;;
print_first xs;;

prints

0
0
5
5

Same for Swift:

class Foo {
    var value = 0
}

let foo = Foo()
let xs = [foo] // immutable array containing the mutable Foo
print(foo.value)
print(xs[0].value)
foo.value = 5;
print(foo.value)
print(xs[0].value)
xs[0].value = 6
print(xs[0].value)
print(foo.value)

prints

0
0
5
5
6
6

Rust is a bit too alien for me to even try to produce an equivalent setup, but I'm pretty sure that the same thing will happen.

rickbutton commented 3 years ago

To me this is less readable. There are code paths and conditions in there that serve only to work around the restrictions on tuples and records, and that are not related to the logic I am interested in.

I guess we should try to clarify what is "readable". To me, a functionally readable bit of code expresses its contract and intent without being too verbose as to hide these details. The addition of a statement doesn't outright make something less "readable" simply because it is longer, IMO, because said statement may add additional information about the contract and intent that the developer wants to express. In this case, you trade an additional conditional for an explicit signal to the reader that the boundary between the mutable and immutable has been crossed, which I think is important.

Also your code is broken for the case where the user wants to put Boxes in SnapshottableStack, which kind of proves my point that it is easier to introduce bugs.

Boxes are not objects, typeof Box({}) === "box", so it will behave correctly.

OCaml definitely lets you put mutable objects (ref is basically the canonical mutable data type) in its immutable data types:

Is the ref keyword not effectively the same as Box? The developer is indicating mutability, and the reader will read ref and understand there is mutability in play.

Same for Swift:

I'm referring to structs, not classes. Classes certainly let you do this. Similar to the ref example above, class indicates a thing that will have mutable members.

sjrd commented 3 years ago

I guess we should try to clarify what is "readable". To me, a functionally readable bit of code expresses its contract and intent without being too verbose as to hide these details. The addition of a statement doesn't outright make something less "readable" simply because it is longer, IMO, because said statement may add additional information about the contract and intent that the developer wants to express. In this case, you trade an additional conditional for an explicit signal to the reader that the boundary between the mutable and immutable has been crossed, which I think is important.

Perhaps it's the influence of static typing, but in Scala we've being crossing the boundary between mutable and immutable stuff without any extra ceremony for years, and it doesn't make it any harder to reason about programs. On the contrary, generic code can be written without caring whether the values of generic types being manipulated are themselves mutable or not. Again, there is a lot of precedent in existing languages for putting mutable things in immutable containers.

Also your code is broken for the case where the user wants to put Boxes in SnapshottableStack, which kind of proves my point that it is easier to introduce bugs.

Boxes are not objects, typeof Box({}) === "box", so it will behave correctly.

It is precisely because typeof Box({}) === "box" that the implementation is broken:

const stack = new SnapshottableStack();
const obj = {};
const myBox = Box(obj);
stack.push(myBox); // typeof myBox === "box", so _stack = #[myBox, null] at this point
const elem = stack.pop(); // Box.isBox(result) is true because result === myBox, so returns result.unbox();
assert(elem === myBox); // blows up, because elem === obj, not myBox

OCaml definitely lets you put mutable objects (ref is basically the canonical mutable data type) in its immutable data types:

Is the ref keyword not effectively the same as Box? The developer is indicating mutability, and the reader will read ref and understand there is mutability in play.

No, the ref x keyword is effectively the same as [x] or {value: x}, i.e., a mutable collection of size 1. You can mutate the reference that's contained in a ref (with :=). You cannot mutate the reference contained in a Box (you can only mutate the thing it references).

Same for Swift:

I'm referring to structs, not classes. Classes certainly let you do this. Similar to the ref example above, class indicates a thing that will have mutable members.

The class is put inside the immutable array. This is exactly an immutable container (the array / a tuple) containing a mutable value (the instance of the class / an object). The class plays the role of the {}, and the immutable array plays the role of the tuple. I can write the same example with a struct containing a class, if that is more convincing for you:

class Foo {
    var value = 0
}

struct Tuple {
    let elem = Foo()
}

let tup = Tuple() // immutable Tuple containing a mutable Foo
let foo = tup.elem
print(foo.value)
print(tup.elem.value)
foo.value = 5;
print(foo.value)
print(tup.elem.value)
tup.elem.value = 6
print(tup.elem.value)
print(foo.value)

produces the same output.

rickbutton commented 3 years ago

Perhaps it's the influence of static typing, but in Scala we've being crossing the boundary between mutable and immutable stuff without any extra ceremony for years, and it doesn't make it any harder to reason about programs. On the contrary, generic code can be written without caring whether the values of generic types being manipulated are themselves mutable or not. Again, there is a lot of precedent in existing languages for putting mutable things in immutable containers.

This certainly does influence things. In a dynamically typed language you receive less assurances that you are manipulating the data correctly.

It is precisely because typeof Box({}) === "box" that the implementation is broken:

Point taken, you are 100% correct, it would require additional handling to also accept a box argument.

Something to consider that also informs the design here is that JavaScript is a -mostly- mutable language, in that almost everything in the entire language is mutable, except for primitives. In a 99% mutable world, it seems more important to signal the places where immutability "ends", because it can be so easy to accidentally reach back into mutable state, whereas with a language like clojure where the "status quo" is immutable data structures intermingled with mutable data, both the programs and programmers are built with this consideration already in mind. In our research of internal code bases using libraries like Immutable.js, the most frequent problem found was the embedding of mutable objects in immutable Immutable.js records, and incorrectly handling the immutable vs mutable parts of the data structure. Requiring explicit boxes eliminates this entire class of problem.

acutmore commented 3 years ago

Python Tuples allow the storing of mutable collections directly (no box concept). However Python tuples with mutable data can not be used as keys in dictionaries. So it shifts the burden of detecting incompatibility and adding code to work around it to a different category of generic functions.

Tuples in Swift can also not be used as the keys in dictionaries, regardless of if their content is deeply immutable.

dead-claudia commented 3 years ago

@sjrd Sorry, I meant like implementing your union type as a specially handled type in the compiler for the purpose of union and subtype checking.

dead-claudia commented 3 years ago

Edit: @rickbutton

Something to consider that also informs the design here is that JavaScript is a -mostly- mutable language, in that almost everything in the entire language is mutable, except for primitives. In a 99% mutable world, it seems more important to signal the places where immutability "ends", because it can be so easy to accidentally reach back into mutable state, whereas with a language like clojure where the "status quo" is immutable data structures intermingled with mutable data, both the programs and programmers are built with this consideration already in mind. In our research of internal code bases using libraries like Immutable.js, the most frequent problem found was the embedding of mutable objects in immutable Immutable.js records, and incorrectly handling the immutable vs mutable parts of the data structure. Requiring explicit boxes eliminates this entire class of problem.

Counterpoint: engines have more information than libraries, and can pass around a bit flag denoting whether something contains mutable data or not, as they see the entire world. This tag can basically act as said "box" implicitly, so they don't actually need an explicit box type.

sjrd commented 3 years ago

@acutmore It doesn't have to be like that, though. Scala tuples and records (called case classes) can be used as keys in maps (Scala's name for dictionaries), whether they contain mutable objects or only immutable elements. In all cases they use the elements' notion of equality, i.e., === if they're mutable or primitive, recursive equality if they're tuples or records. In other words, Scala does exactly what is being proposed currently for ECMAScript tuples and records, except without the Box indirection.

@rickbutton

Something to consider that also informs the design here is that JavaScript is a -mostly- mutable language, in that almost everything in the entire language is mutable, except for primitives. In a 99% mutable world, it seems more important to signal the places where immutability "ends", because it can be so easy to accidentally reach back into mutable state, whereas with a language like clojure where the "status quo" is immutable data structures intermingled with mutable data, both the programs and programmers are built with this consideration already in mind. In our research of internal code bases using libraries like Immutable.js, the most frequent problem found was the embedding of mutable objects in immutable Immutable.js records, and incorrectly handling the immutable vs mutable parts of the data structure. Requiring explicit boxes eliminates this entire class of problem.

Well, I don't have anything to answer to that, except perhaps noting the example of Python again, which is, like JavaScript, dynamically typed and mostly imperative. Which means that if I haven't convinced you yet, I probably won't. It remains to be seen whether the advertised benefits will outweigh the drawbacks that I see.

acusti commented 3 years ago

I understand that the proposal is presented as "fundamentally" about that, arguing that not-deeply immutable structure will lead to bugs in large software. I don't see any evidence of that, though (and I've read a lot about that in this repo before starting to engage here).

two concrete examples from my experience of this issue:

when i first came across Object.freeze, i used it to wrap a collection of config objects keyed by their id ({ [id: string]: Config }) that were used for defining different modals (with a title, message, dimensions, etc). some time after shipping it, i was very surprised by a bug in our app where in some cases, the message of a modal would be incorrect (it would render a dynamically generated message from a different modal). that was how i discovered that Object.freeze only freezes the outermost object; another developer had used the same modal config for two similar modals, using something like config.message = 'Are you sure you want to delete ' + count + ' items?'; openModal(config); for the one with a dynamic message. i have helped other coworkers deal with the same confusion during code review or pairing, so i also know that the behavior wasn’t only surprising to me.

my second example is very directly related. we use immutable.js in our app for our redux store. immutable.js lets you put anything you want inside it, including primitives, other immutable.js data structures, plain javascript objects, etc. this led to developers sometimes choosing to store data as a POJO for convenience within a immutable.js Map or List, which has in many cases led to confusion, either because a developer expected the data from the store to be deeply immutable or because values were mutated as a part of the logic in a selector without the developer realizing the implication of what they were doing. as a result, we have since made an explicit rule that all data in the store must be made up of immutable collections. it would’ve been great for us if immutable.js had simply forbidden storing mutable data structures inside their collections from the beginning.

my hypothesis based on my experience is that for many (most?) javascript developers, the different rules and exceptions around mutability and immutability and pass-by-copy and pass-by-reference are confusing and hard to reason about. for that reason, my instinct is that making tuples and records deeply immutable by default (i.e. requiring explicit boxing) will be the best choice for the largest percentage of javascript developers. most developers will run into errors when they try to put a POJO inside a record or tuple, but the fact that their code will immediately break should help them quickly learn the rules of the new data structures.

all of that said, i definitely see the benefits of implicit boxing as so clearly illustrated by @sjrd. it’s a question of trade-offs, and i am thankful to not be responsible for making that final decision.

robbiespeed commented 3 years ago

Seems like the main argument in favour of explicit boxing/unboxing, is to avoid mistakenly putting mutable data inside, immutable structures. That can be solved by linters, and/or typescript, rather than making it a language feature to force non deep immutability down a hard path.

acutmore commented 3 years ago

Seems like the main argument in favour of explicit boxing/unboxing, is to avoid mistakenly putting mutable data inside, immutable structures. That can be solved by linters, and/or typescript, rather than making it a language feature to force non deep immutability down a hard path.

The fact that linters and type checkers can help catch mixing mutable and immutable data is a good point. That said, I do think there are downsides to leaving the check to tooling.

Not every project uses linters and/or TypeScript, and projects that do may configure their rules differently. If R&T are deeply immutable and this is enforced by the language then that is a guarantee that can not be broken, reducing ambiguity. Whereas a tool can always be overridden. There are also cases where linters may not be able to catch this mixing.

Other recent additions to JS have also opted for runtime exceptions to help developers catch possible mistakes. Symbol, for example, can be converted into a string with Symbol.prototype.toString but any implicit conversion to a string throws. Symbol() + '' // throws.

littledan commented 3 years ago

We discussed this question in a recent SES call. One argument which seemed to resonate with @erights was the hazard that, if no explicit boxing is needed, then developers could accidentally write #{ a: { b: 1 } } when they mean #{ a: #{ b: 1 } }, and the bug could be hard to trace. It helps to get these reminders sooner.

acutmore commented 3 years ago

if no explicit boxing is needed, then developers could accidentally write #{ a: { b: 1 } } when they mean #{ a: #{ b: 1 } }, and the bug could be hard to trace

To further this example, it would likely get harder to spot with each layer of indirection.

function getA() {
  return #{ a: getB() } // does getB return an object?
}
mhofman commented 3 years ago

With automatic unboxing, what is the semantics difference between a record/tuple and plain frozen objects with a deep equality predicate?

Unless you test every nested value you're handling, you have no guarantee what you're handling is not just merely frozen on the surface instead of being deeply immutable.

Take the vdom example, how would you differentiate immutable structure from mutable integration points? Box have a semantic meaning, they identify the exit points in an immutable structure. The hope is that you can then ask questions such as: is my structure the same if you ignore the content of the exits.

mhofman commented 3 years ago

@isiahmeadows, so if I try to summarize your point, you're saying that it's impossible for an engine to implement an efficient deep equality predicate for existing plain frozen objects and arrays, and since records/tuples have a more constrained structure, the engine would be able to implement that predicate (in the form of ===) more efficiently (I purposely avoided O() notation as I hear implementers are saying for example that O(1) for equality is probably not going to happen). While I agree in general about the relative performance of records and tuples with the object counterparts, I'm not sure this addresses my point.

I keep hearing claims that putting primitives in a Box should be supported. If that's the case, how would automatic boxing even work? And then, you can't explicitly identify the separation between "system-data" and "user-data". Nor can you implement a predicate such as sameStructure / equalsIgnoreBoxes.

dead-claudia commented 3 years ago

@mhofman Oh, I see what you're talking about now. And yeah, I don't have an answer on that. (I'm not a fan of boxing in general, FWIW.)

Edit: Deleted my other comments that completely missed what you were talking about.

sjrd commented 3 years ago

I keep hearing claims that putting primitives in a Box should be supported. If that's the case, how would automatic boxing even work? And then, you can't explicitly identify the separation between "system-data" and "user-data". Nor can you implement a predicate such as sameStructure / equalsIgnoreBoxes.

If there's automatic boxing and unboxing (or, equivalently, if objects can be put in R/T without boxes at all), then we don't need to be able to put primitives in Boxes. All types of data will directly be put in R/T. Putting primitives in Boxes is required for generic code that manipulates arbitrary types of data; if arbitrary data can be put in R/T, generic code is already served well, and much better than with Boxes, whether or not they can contain primitives.

mhofman commented 3 years ago

Right so automatic unboxing only makes sense if there is also automatic boxing then?

Unless we take the pass-through approach proposed in https://github.com/tc39/proposal-record-tuple/issues/258#issuecomment-939344671. That you would keep the following true, without making Box even weirder to handle:

const r = #{foo: Box('foo')};
assert(r.foo === 'foo');
assert(typeof r.foo === 'string');

The problem is that such a dissymmetry between manual boxing and automatic unboxing is strange. It either should be transparent for both operation, or transparent for neither. Furthermore, automatic unboxing would prevent virtualization: to maintain realm-based membrane security guarantees, boxing needs to capture the local realm, but if unboxing is syntax based, there is no way to emulate/shim this realm-check. Similarly, automatic boxing would break existing SES like deployments if records/tuples are non-objects, as there would now be a non-deniable way to pass around objects with authority nested inside primitives which are assumed to be inert.

So I only see 2 ways:

sjrd commented 3 years ago

The problem is that such a dissymmetry between manual boxing and automatic unboxing is strange. It either should be transparent for both operation, or transparent for neither.

I agree.

Furthermore, automatic unboxing would prevent virtualization: to maintain realm-based membrane security guarantees, boxing needs to capture the local realm, but if unboxing is syntax based, there is no way to emulate/shim this realm-check. Similarly, automatic boxing would break existing SES like deployments if records/tuples are non-objects, as there would now be a non-deniable way to pass around objects with authority nested inside primitives which are assumed to be inert.

This comments makes a bit nervous. IIUC, there is no problem with automatic box/unbox if we can update SES to deal with that. It seems like it's only a problem for "existing SES like deployments". If I get that right, that means that old SES-style stuff, just by its existence, absolutely constrains how we can design the language features of the future!? That seems like a very, very strong constraint that SES is imposing on ECMAScript. SES was written at some point on the timeline of ECMAScript, and it worked around what was there at the time. But for the future it can't do that anymore? We have to stay forever compatible with the first version of SES?

I'm not a security specialist, but such constraints seem completely out of proportion to me.

nicolo-ribaudo commented 3 years ago

just by its existence, absolutely constrains how we can design the language features of the future!?

This happens often when designing ECMAScript proposals, even if usually because of conflicts with old unmaintained libraries.

sjrd commented 3 years ago

Sure, but usually it's only in the form of new built-in functions that have to not collide with existing polyfills of the same name, polyfilling different semantics. It usually does not apply to entire new datatypes. For example, there never was any such constraint to the introduction of bigints.

mhofman commented 3 years ago

New changes to the language have to respect both documented or assumed invariants that existing programs may rely upon. There is a separate effort going on to document some of these invariants that are not currently explicitly stated in the spec, so that new proposals can more easily navigate this minefield.

While not strictly an invariant, there is sensitive deployed code that does rely on the assumption that primitives are completely inert, and that it's impossible to reach an existing object solely from a primitive. Automatic unboxing of a primitive type would break that assumption.

In the case of existing a SES-like library, it'd have to remove all unknown globals to keep its guarantees, so we also realized that explicit boxing through a global constructor would be sufficient to protect these existing deployments (as would explicit non-prototype based unboxing).

As mentioned before, Bigint does not break any such invariants. The SES-shim and predicates based on it will definitely be updated to support boxes when introduced to the language, and if we can add at the same time an Object.hasObject() intrinsic predicate, then in time, further additions to the language may have more freedom.

It seems like an appropriate trade-off to require explicit access to box content as there are other strong arguments for them, including increased compatibility with the proposed ShadowRealm API which disallows objects through the callable boundary (passing records/tuples that contain objects would have to throw instead of being able to let them through and defer to an explicit unbox which can be virtualized)