tc39 / proposal-record-tuple

ECMAScript proposal for the Record and Tuple value types. | Stage 2: it will change!
https://tc39.es/proposal-record-tuple/
2.5k stars 62 forks source link

Equality semantics for `-0` and `NaN` #65

Closed bakkot closed 2 years ago

bakkot commented 5 years ago

What should each of the following evaluate to?

#[+0] == #[-0];

#[+0] === #[-0];

Object.is(#[+0], #[-0]);

#[NaN] == #[NaN];

#[NaN] === #[NaN];

Object.is(#[NaN], #[NaN]);

(For context, this is non-obvious because +0 === -0 is true, Object.is(+0, -0) is false, NaN === NaN is false, and Object.is(NaN, NaN) is true.)

Personally I lean towards the -0 cases all being false and the NaN cases all being true, so that the unusual equality semantics of -0 and NaN do not propagate to the new kinds of objects being introduced by this proposal.

bakkot commented 4 years ago

Does the decision on equality semantics for -0 and NaN impact this aspect of the proposal? If yes, how?

If you are doing x === x for a record x, it is nice to be able to return true immediately rather than recursing over all the fields. That doesn't work if contents are compared with === instead of Object.is, because of NaN.

bakkot commented 4 years ago

@littledan

I'm not yet sold on normalizing -0 to 0. I would prefer if we could let all primitives be represented in Records and Tuples unscathed. Otherwise I worry this becomes yet another case for people to think about and consider whether it affects them.

I would argue that not normalizing would lead to far more people having to think about -0.

papb commented 4 years ago

If you are doing x === x for a record x, it is nice to be able to return true immediately rather than recursing over all the fields. That doesn't work if contents are compared with === instead of Object.is, because of NaN.

This could be easily avoided by internally keeping track of a hasNaN flag that is set upon the creation of the Record/Tuple, in O(1). Then x === x can be optimized to !hasNaN in O(1).

papb commented 4 years ago

I would argue that not normalizing would lead to far more people having to think about -0.

@bakkot What about all the other arguments in favor of not normalizing?

bakkot commented 4 years ago

@papb None of those seem at all compelling to me. I expect at most a vanishingly small number of programs ever put -0 into an array or object and then retrieve it and then rely on it still being -0 instead of 0.

erights commented 4 years ago

@papb asks

What about all the other arguments in favor of not normalizing?

with links which I followed.

@bakkot responds

None of those seem at all compelling to me. I expect at most a vanishingly small number of programs ever put -0 into an array or object and then retrieve it and then rely on it still being -0 instead of 0.

From the following the links to the arguments for not normalizing, I still find myself in agreement with @bakkot. Can anyone show a more compelling example illustrating a problem with normalizing? If not, can anyone explain to me why these examples or arguments are more significant than they appear?

kfdf commented 4 years ago

Objects and arrays?

They are not value types, they are compared by identity and not their contents and aren't terribly useful as keys in maps. How do you implement a memoizer for a function that takes several arguments? You'll quite likely JSON.stringify them and use that as a key in the map that caches the results. And -0 flips to 0 when put in a string, and no one ever has had any problems with that.

Although this code not being bugged would be good on its own, that would be against many other issues I and others have raised

Nobody argues that doubles work the way they do for very good reasons, but in Javascript they really do double job, as floating point numbers and as integers. It seems this is how Javascript got away with having a single numeric type, that if you treat doubles as integers, they really start acting like ones, even allowing to check them for equality safely. And there is no such thing as the -0 integer. Object.is without normalizing violates this expectation. It effectively introduces the -0 integer in some pretty trivial scenarios, and if JSON.stringify is used for debugging, it may result in unforgettable experience.

rickbutton commented 4 years ago

Updating this thread to comment on the addition of the stage 3 milestone. We believe that our initial solution (in the current explainer, === uses Object.is semantics recursively on Records and Tuples) is adequate to start targeting with spec text, but we are obviously open to alternative solutions based on the outcome of this discussion. I want to make sure that the people in the thread are aware that we want to continue iterating on this solution, but we don't think it's a blocker for progressing to Stage 2.

michaelficarra commented 4 years ago

@rickbutton Since the equality semantics may dictate which use cases this proposal can address, I wouldn't be so sure that this is something to be worked out after stage 2. Unless you're hoping to let the equality semantics be dictated by the use cases?

rickbutton commented 4 years ago

@michaelficarra I disagree that the chosen solution here will actually dictate use cases. The problem space is important because we obviously want to limit the number of sharp edges around -0/NaN, but I don't imagine that the decisions made in this thread will have a significant impact on the general usage or ergonomics of Record and Tuple. As @erights and @Zarel mentioned, it looks like the only place that -0 has any meaningful difference is wrt the infinities, and I doubt that NaN equality will significantly impact the way in which people use Record and Tuple (even though it is important that we get this right). The various choices in the thread don't limit the usefulness of Record and Tuple in general, they just change which direction the sharp edge is pointing.

erights commented 4 years ago

There is zero controversy regarding the semantics of applying Object.is to records and tuples. Therefore, to the degree that this semantics supports a variety of use cases --- which is our major premise anyway --- this proposal at least supports those use cases via Object.is, no matter how the remaining controversies work out.

erights commented 4 years ago

It is left as an exercise for the reader to determine whether there is also minus zero controversy about this same point.

papb commented 4 years ago

Can anyone show a more compelling example illustrating a problem with normalizing? If not, can anyone explain to me why these examples or arguments are more significant than they appear?

Thank you @erights for this request. I intend to make a 'write-up' later today on why I believe that:

papb commented 4 years ago

There is zero controversy regarding the semantics of applying Object.is to records and tuples.

This is correct.

However I got confused about this claim yesterday, so if there is anyone out there confused like I was, it should be noted that there is still controversy about Object.is(#[+0], #[-0]). Not due to Object.is semantics, but due to the -0 normalization controversy.

ljharb commented 4 years ago

@papb i'm not sure how; Object.is(#[+0], #[-0]) conceptually has to do what Object.is does, which is differentiate -0 from +0.

noppa commented 4 years ago

@ljharb It's been suggested here that -0 could be normalized to +0 like it's normalized in Sets. If that approach was selected, then

const negativeZero = -0
const negativeZeroInTuple = #[negativeZero]
const suddenlyPositiveZero = negativeZeroInTuple[0]
Object.is(+0, suddenlyPositiveZero) // true

// Similarly with Sets now
Object.is(+0, new Set([-0]).values().next().value) // true

Surely if the value is normalized to +0, like in the example above, then also

Object.is(#[+0], #[-0]) // true

But obviously, not everyone agrees with normalizing, so I believe that's the controversy @papb is referring to.

papb commented 4 years ago

Precisely, @noppa!

Zarel commented 4 years ago

Here's an overview of the controversy as I see it:

#[0] === #[-0]

It seems like ever since my post, everyone agrees that this should be true. Which is good! That was the single biggest footgun I saw in this proposal, so I'm glad to see it's being addressed (I notice that the spec text has not yet been updated to fix this, but I hope it will happen soon - I'd prefer for any of the proposed changes to make it into the spec immediately, before further discussion, to make sure this doesn't go unfixed because of controversy over implementation details).

Controversy, however, abounds with how this should be made true. Some prefer normalization, some prefer IEEE 754-compliant === operators:

Object.is(#[0], #[-0]);

I weakly prefer false, but normalization would make this true.

Normalization would in general make it impossible to directly store a distinct -0 in a record/tuple. This would make them work differently from regular variables, objects, and arrays, but would make them work similarly to Map and Set.

I think lean towards liking how JavaScript has been moving away from having footguns in new APIs, even if it makes them inconsistent with old APIs. But I still see IEEE 754's features as useful, and not necessarily footguns in need of removal. I'd rather be consistent with IEEE 754. Consistency is also useful.

It seems to me that the only arguments in favor of normalization are "consistency with Map/Set" (but since record/tuple are syntactically more similar to objects/arrays, I think consistency with objects/arrays is more important), and "performance" (which others have mentioned, can be retained with a simple flag for "this record/tuple doesn't contain -0 or NaN).

#[NaN] === #[NaN]

I weakly prefer false for this one, as well. If you do two calculations and get NaN for both of them, they are not the "same number". I think treating them as equal and treating them as unequal could both cause footguns (perhaps we should have had checked exceptions rather than NaN, but it's too late now), so I'd rather default to being consistent to programmers' expectations, which is that === on records/tuples behaves as a pairwise === for their elements.

I would expect that if you polled developers with "Does NaN === NaN?" and then "Would you expect #[NaN] === #[NaN]?", the majority of developers who answered "no" for the first question would answer "no" for the second question, and so I would also choose "no" based on the Principle of Least Astonishment.

bakkot commented 4 years ago

Thanks for the writeup, @Zarel. Some comments:

everyone agrees that this should be true

While I am on board with this, at least @littledan and @michaelficarra have expressed disagreement. It is certainly not everyone.

I'd prefer for any of the proposed changes to make it into the spec immediately, before further discussion, to make sure this doesn't go unfixed because of controversy over implementation details

I wouldn't worry about this. The proposal will not advance to the point that implementations are shipping unflagged before this issue is resolved; that's a major goal of TC39's process.

It seems to me that the only arguments in favor of normalization are "consistency with Map/Set" and "performance".

My argument in favor of normalization is to avoid introducing new cases where === (equality) and Object.is (identity) disagree, while still allowing #[0] === #[-0] to hold. Currently there are exactly three values in the language for equality and identity disagree, and I am fairly strongly opposed to introducing more. I care less about the two arguments you point out.

I'd rather default to being consistent to programmers' expectations, which is that === on records/tuples behaves as a pairwise === for their elements.

This is in conflict with another expectation, which is that x === x for all values x (with the sole exception of NaN). We can't meet both expectations. I think preserving reflexivity is more important than being strict about pairwise ===.

Zarel commented 4 years ago

While I am on board with this, at least @littledan and @michaelficarra have expressed disagreement. It is certainly not everyone.

It's my impression that they disagreed over implementation details (normalization vs equality strictness) rather than expecting #[0] !== $[-0], but maybe they expressed disagreement outside of this issue thread?

My argument in favor of normalization is to avoid introducing new cases where === (equality) and Object.is (identity) disagree, while still allowing #[0] === #[-0] to hold. Currently there are exactly three values in the language for equality and identity disagree, and I am fairly strongly opposed to introducing more. I care less about the two arguments you point out.

This is in conflict with another expectation, which is that x === x for all values x (with the sole exception of NaN). We can't meet both expectations. I think preserving reflexivity is more important than being strict about pairwise ===.

This makes sense, thanks. I tried to make clear that I was only summarizing my own impression of the controversy, because I don't fully understand the other side.

I'm on @devsnek's side that "records/tuples containing -0 or NaN" should be considered exceptions for the same reasons that -0 and NaN themselves are, but I agree that it's complicated and both approaches have drawbacks.

erights commented 4 years ago

For completeness, I'm going to toss another possibility in the ring. It has, of course, distinct pros and cons compared to all the others.

@Zarel writes:

I'm on @devsnek's side that "records/tuples containing -0 or NaN" should be considered exceptions for the same reasons that -0 and NaN themselves are, but I agree that it's complicated and both approaches have drawbacks.

What if we make them actual exceptions, as in thrown errors?

We already accept that records and tuples cannot contain some things, so expressions like #[{}] do not evaluate to any value. Rather, they throw an error. We could decide that #[NaN] and/or #[-0] likewise do not evaluate to any value, but rather throw an error. With this rule, === and Object.is would always agree, and they would both also be consistent with their recursive application to the leaf values.

Zarel commented 4 years ago

@erights I would strongly oppose doing that for -0, because it's so common (as mentioned earlier, you can get it just by multiplying integers), and treating it as equal to 0 has basically no drawbacks in practice.

I would weakly oppose doing it for NaN, because there are legitimate reasons why someone would want to store it in a record/tuple. It would also not be checkable by TypeScript, whose type system isn't sophisticated enough to tell whether a number can or can't be NaN.

ljharb commented 4 years ago

The current story is "Records and Tuples hold primitives". I don't think it would be an improvement over any outcome to have the story be "Records and Tuples hold primitives except for NaN and -0".

waldemarhorwat commented 4 years ago

There are a number of scenarios where the differences between +0 and -0 are important in addition to those already covered above. Taking the reciprocal is not the only way to distinguish them in practice. These come up in function branch cuts. For example, take a look at what Math.atan2 does. Normally code that does that deals with lots of {x, y} ordered pairs, which is exactly what Tuples were designed for.

hax commented 4 years ago

I feel we should keep consistency with normal ===.

The weirdness of +0 === -0 and NaN !== NaN are coming from IEEE float spec (with reasonable rationale), developers are inevitable to learn it (once and only once, not only for JS, but for all programming languages use IEEE 754 float), adding an exception for tuple/record just make things worse (now we need to explain why +0 === -0 but #[+0] !== #[-0], or why NaN !== NaN but #[NaN] === #[NaN]) and never really solve anything. It even may cause bugs if someone try to refactor deepEquals([...], [...]) to #[...] === #[...].

Huxpro commented 4 years ago

My two cents: Record/Tuple should NOT enforce any new coercions on existing equalities, i.e. all their equalities (==/===/Object.is) should be defined recursively as the corresponding equalities of their contents.

Doing so

  1. preserved the orthogonality between language features
    • is least surprising from a user perspective.
  2. allow an inductive semantics that is theoretically clean.
    • seem to provide more rooms for optimization.
  3. allow a nature implementation that simply performs trivial lowering.
    • codegen size (memory overhead), perf (NaN has multiple encoding).
rickbutton commented 4 years ago

An option that came up in the hallway track at this TC39 was to use SameValueZero for equality of Record and Tuple, which splits the difference between strict equality and SameValue equality.

If this equality was chosen, then:

assert(#[0] === #[-0])
assert(#[NaN] === #[NaN])

assert(#[0] == #[-0])
assert(#[NaN] == #[NaN])

In summary, if we went with SameValueZero, records with -0 and +0 would be equal to each other, while records with NaN would also be equal.

This seems to satisfy the concerns of breaking -0 equality in Record/Tuple, while also preventing NaN from "black-holing" any Record or Tuple with a NaN inside of it.

What are your thoughts?

devsnek commented 4 years ago

I absolutely hate that it would consider NaN equal to NaN, but I won't die on that hill.

bakkot commented 4 years ago

I would still be sad that we would be introducing new values for which === and Object.is differ, but at least we would not be introducing new values for which Object.is fails to imply ===.

jridgewell commented 4 years ago

One of the things I overhead during the discussion is code which checks for NaN:

function isNan(value: unknown) {
  return value !== value;
}

I know this exact check is used in a ton of places, and allowing #[NaN] !== #[NaN] would break it. Given that, I am completely convinced that a record containing NaN must be equal to itself. So SameValueZero works for me, and strict equality semantics don't.


We could also throw when trying to create a records that contains NaN. We already have runtime checks that ensure a value is a primitive, so this would just be one more runtime check. Could be explained as "records are special containers for equality, and NaN's equality is broken".

ljharb commented 4 years ago

@rickbutton confirming: === and == for Records and Tuples both will use "recursive SameValueZero", and Object.is for Records and Tuples will use "recursive SameValue"?

rickbutton commented 4 years ago

@ljharb yes exactly.

papb commented 4 years ago

The issue raised by @jridgewell, that x !== x would no longer be a valid check for x being NaN, is strong. Thanks for pointing that out.

However, I'd like to reiterate the argument by @icefoxen:

Just a random person here, but I would really really prefer that IEEE 754 numbers act like IEEE 754 numbers, and not what "make sense". NaN != NaN is a pain in the ass for everyone but it is there for good reasons. NaN is not a value, it's a catch-all for "can't do this". It's SUPPOSED to be a pain in the ass, because it's a signal that your code screwed up somewhere. NaN also is not a number, it's not really even intended to be a value, it's an error. If NaN == NaN, then you're saying 0/0 == inf/0 , which doesn't seem helpful at all. You might as well assert that two uninitialized values in C have to compare equally.

Second, your computer's hardware isn't going to like you trying to tell it that NaN's are equal, and there are different encodings of NaN, so it's turns every floating point comparison into multiple ones.

Please don't randomly second-guess a standard "because it seems to make sense to me", especially when it's trivial to find out the reasons these things are why they are. I'm all for trying to find a better approximation for real numbers than IEE754, for interesting values of "better", but when every computer built in the last 40 years has worked a particular way I'd like people to please think more than twice before saying "let's just randomly change the rules in this particular use case".

This argument by @icefoxen is, to me, even stronger.

Therefore, my opinion has now changed to the latest suggestion by @jridgewell:

We could also throw when trying to create a records that contains NaN. We already have runtime checks that ensure a value is a primitive, so this would just be one more runtime check. Could be explained as "records are special containers for equality, and NaN's equality is broken".

This gracefully matches with what @icefoxen said, in particular this part:

[...] it's not really even intended to be a value, it's an error. [...]


In short: #[0] === #[-0] and #[NaN] throws.

devsnek commented 4 years ago

it doesn't really make sense to throw, the error was the nan being produced by some math op, not the nan being stored somewhere.

papb commented 4 years ago

@devsnek Ugh, you also have a point. But what to do? :sweat_smile: None of the options are good enough then :sweat_smile:

papb commented 4 years ago

On a second thought, is the fact that x !== x will no longer be equivalent to detecting NaN really that bad?

littledan commented 4 years ago

@papb

On a second thought, is the fact that x !== x will no longer be equivalent to detecting NaN really that bad?

The assertion of some commenters in this thread is, "yes, people depend on this". Do you have a reason to disagree?

Zarel commented 4 years ago

The assertion of some commenters in this thread is, "yes, people depend on this". Do you have a reason to disagree?

People who depend on that don't use records/tuples. And if a record/tuple with a NaN inside it gets into their code, it's arguable that returning true from myIsNaN is the correct behavior.

You could equally argue that typeof record === 'record' could break old code because 'record' is not a string that typeof is expected to return.

I do agree that the num !== num test for NaN could possibly cause bugs in certain unusual situations, but I'd be surprised if it would break any actual code out there, and I still think === not meaning pairwise === is much more likely to cause bugs in actual used code because that's how developers are going to expect it to behave.

littledan commented 4 years ago

The PR https://github.com/tc39/proposal-record-tuple/pull/130 adopts SameValueZero semantics, as proposed by @rickbutton at https://github.com/tc39/proposal-record-tuple/issues/65#issuecomment-638377863 . What do you all think?

rricard commented 4 years ago

@Zarel

People who depend on that don't use records/tuples. And if a record/tuple with a NaN inside it gets into their code, it's arguable that returning true from myIsNaN is the correct behavior.

Agreed, it is not strictly a web compatibility issue, existing production code will run fine. That being said, it is a composability issue with existing libraries where that equality check will break when passing records and tuples. Those NaN checks are prevalent enough to be a concern.

You could equally argue that typeof record === 'record' could break old code because 'record' is not a string that typeof is expected to return.

I am not sure such checks exist at the moment but you do raise a good point with typeof: some code that might want to exhaustively check all results of what a typeof returns might break as well after introducing Record & Tuple in that codebase. For instance:

switch(typeof x) {
  case "number": // ...
  // .... other cases for each primitive type ... but no default ...
}

We have prior art with this when Symbols were introduced; although they contain this compatibility risk, it turned out to be OK. Since then, we've also added BigInt, with a new typeof. This makes me optimistic that we can add new typeof values for Record and Tuple.

rricard commented 4 years ago

Additionally, I'm going to reclassify this issue to be solved before Stage 2 as the committee made it clear on Monday that they are interested in seeing a solution that they could agree on prior to Stage 2.

Zarel commented 4 years ago

I am not sure such checks exist at the moment but you do raise a good point with typeof: some code that might want to exhaustively check all results of what a typeof returns might break as well after introducing Record & Tuple in that codebase. [...]

We have prior art with this when Symbols were introduced; although they contain this compatibility risk, it turned out to be OK. Since then, we've also added BigInt, with a new typeof. This makes me optimistic that we can add new typeof values for Record and Tuple.

That's my point. To be clear, my point is that the num !== num check for NaN is similar to the exhaustive typeof check, in that in theory you could imagine situations in which the behavior could cause problems with old code interfacing with new code, but in practice I don't think it would ever be a problem.

I think === not being defined as elementwise === on records/tuples would be far more likely to cause real problems.

bakkot commented 4 years ago

I think === not being defined as elementwise === on records/tuples would be far more likely to cause real problems.

I disagree. Making it such that any record or tuple containing a NaN is never === to anything, including itself, is significantly worse.

To be broken by SameValueZero semantics, you need to have done two computations, both of which resulted in a NaN, and then put the results into tuples whose other values are all the same, and then relied up the resulting tuples not being === despite being the same value. This will not be common. On the other hand, to be broken by === semantics, you just need to have a NaN somewhere, and rely on a value being === to itself. This is a very frequent assumption, and hence this will happen frequently.

We can't introduce new values which are not === themselves. There's just no way that's a good idea.

(I want to raise again that tuples in Python which contain a NaN are reflexive, such that x == x even if x contains a NaN, and this doesn't seem to have been a major issue. And Java's Record classes and value types proposals are both intending to do the same thing.)

hax commented 4 years ago

tuples in Python which contain a NaN are reflexive

To be clear, it's reflexive but only they are same NaN, if there is another NaN, they don't equal. To be honest I will support such behavior, but it also introduce another magic and need many spec work.

Huxpro commented 4 years ago

I see many raised the concerns about "reflexive equality", but up to what definition?

We could say the below definition of equality is reflexive over the set of the textual representation of the program (i.e. the source code).

#[NaN] === #[NaN] // if => true

But if we look at the set of machine representations, you are not just demanding a reflexivity. In IEEE-754 you got 2^24 - 2 NaNs in a 32-bits float and 2^53 - 2 NaNs in a 64-bits double, and you are essentially demanding all of them to be equivalent, e.g.

#[0b1_11111111_10000000000000000000000] === 
#[0b1_11111111_00000000000000000000001] // should this be true?

This will make JS engines' life harder and so is your performance.

IEEE float is defined to avoid so and it's a well-known partial equivalence relation. "Refining"-away from such consensus could only introduce endless misalignment with any formal definitions or human understandings that we've learnt and built upon.

And really, breaking the recursively structural equality would be a much much bigger problem if we are going to talk about theoretical purity.

rricard commented 4 years ago

@hax That behavior already exists for Map keys and Set values and is defined as SameValueZero in the spec. This does not introduce more complexity in the spec text that we drafted.

@Huxpro This is a good point and is mainly why this decision will stay open until Stage 3 (the way the spec text is written lets us switch the internal equality operation easily) as we will attempt to implement it, if we see this is an unavoidable performance cliff then maybe it will be worth reconsidering this decision, as of now I'm unsure we can make that call so we are focusing on what would be the most ergonomic thing to do. I still think this is the least confusing behavior.

hax commented 4 years ago

@rricard If u are talking about my latest comment , no, Python's behavior is not SameValueZero. I guess I should give a much clear example :

nan1 = float('nan')
nan2 = float('nan')

nan1 == nan1 // false 
nan1 == nan2 // false

[nan1] == [nan1] // true <-- reflexive!
[nan1] == [nan2] // false <-- but different nan is not equal!

So in python, every NaN is not same, but each NaN is same as itself. This could be checked by is.

nan1 is nan1 // true   <-- reflexive!
nan1 is nan2 // false  <-- but different nan is not same!
rricard commented 4 years ago

I think I misread your comment. Thanks for the precisions. This is helpful but indeed that would put a bit more subtleties in the spec text.

hax commented 4 years ago

And Ruby have the similar behavior as Python!

I suggest we should check more programming languages.

littledan commented 4 years ago

@hax In JS, we in general don't have the semantics of having multiple different NaN values. Equality operations either make all NaNs equal or all unequal. You can see different NaN payloads when writing them to a Float64Array, but this doesn't necessarily reflect what they are in memory (and, in fact, engines do not maintain stability in practice with these values, despite spec text to the contrary). TC39 delegates have repeatedly emphasized the importance of maintaining these semantic patterns, and this proposal is consistent with them. If you believe we should revisit this, I'd suggest you make a presentation to the committee explaining why. But I think this change would be orthogonal to Records and Tuples.