tc39 / proposal-record-tuple

ECMAScript proposal for the Record and Tuple value types. | Stage 2: it will change!
https://tc39.es/proposal-record-tuple/
2.5k stars 62 forks source link

Equality semantics for `-0` and `NaN` #65

Closed bakkot closed 2 years ago

bakkot commented 5 years ago

What should each of the following evaluate to?

#[+0] == #[-0];

#[+0] === #[-0];

Object.is(#[+0], #[-0]);

#[NaN] == #[NaN];

#[NaN] === #[NaN];

Object.is(#[NaN], #[NaN]);

(For context, this is non-obvious because +0 === -0 is true, Object.is(+0, -0) is false, NaN === NaN is false, and Object.is(NaN, NaN) is true.)

Personally I lean towards the -0 cases all being false and the NaN cases all being true, so that the unusual equality semantics of -0 and NaN do not propagate to the new kinds of objects being introduced by this proposal.

Jack-Works commented 4 years ago

To be clear, it's reflexive but only they are same NaN, if there is another NaN, they don't equal.

Oh they have a more weird semantics

hax commented 4 years ago

Oh they have a more weird semantics

NaN is always weird ^_^ It's hard to say which behavior is much weird.

we in general don't have the semantics of having multiple different NaN values

@littledan Yes I know. I just try to explain the different behavior in different programming languages, especially there are misunderstanding of them and use them as the proof of which semantic we should adopt.

But I think this change would be orthogonal to Records and Tuples.

There are several contradict requirements on the behavior of NaN in tuple/records, the interesting thing is Python/Ruby behavior seems satisfy many of them (though it rely on multiple NaN values). So at least it would be a motivation of refining NaN semantic. Another motivation may be Math.signbit(NaN).

hax commented 4 years ago

I tested C#.

For C# value tuple, (NaN, 1) ==(NaN, 1) always returns false, and (NaN, 1).Equals(NaN, 1) always returns true. For C# System.tuple (object), same result except var x = Tuple.create(NaN, 1);x == x returns true (because it's object so same reference always true).

I will test other languages if I have time.

hax commented 4 years ago

Swift tested.

Have very same result as C#, tuple behave same as C# value tuple, Array/Set behave same as C# System.tuple. (But Swift only have ==, no Equal method)

rickbutton commented 4 years ago

We just had a meeting with some V8 folks about performance, and I wanted to document some thoughts re: SameValueZero comparison on this issue thread.

Specifically, the thinking is that using "recursive SameValueZero" comparison for === for Record/Tuple might make equality checking complex. If === produced reliable comparison for "observably equal" Records/Tuples (i.e. what Object.is does), then interning becomes more straightforward (since you would be able to swap one of the equality operands out for the other without managing a "set" of things that are equal, some with -0 and some with +0). This complexity doesn't strictly prevent us from choosing SameValueZero, but does certainly increase engine complexity for engines that wish do to interning or other equality optimizations.

devsnek commented 4 years ago

I'm comfortable saying that this justifies increased complexity.

rickbutton commented 3 years ago

I mentioned this briefly in my previous comment, but I want to elaborate on this after some more discussion with implementers:

If "recursive SameValueZero" is used when comparing Records/Tuples, interning becomes significantly more complicated, to the point of not being worth the effort. Even though #[0] === #[-0], the engine must obviously keep track of both tuples, which means that even if they are interned, the engine must maintain a separate pointer to the data (or some other method of keeping track of the actual value), which diminishes the benefits of interning.

There are a few possible choices here, to recap:

  1. #[0] === #[-0] and === is generally expected to be a linear time operation for records/tuples. In this case, engines could intern values via Object.is since it uses SameValue and Object.is(#[0], #[-0]) is false.
  2. #[0] === #[-0] and === is generally expected to be a linear time operation for records/tuples. Additionally Object.is compares Record/Tuple identity (as in, they are instead just objects) which allows interning to be implemented in userland.
  3. Records/Tuples canonicalize -0 to 0, in which case this isn't a problem.
  4. #[0] !== #[-0] and engines are free to implement interning records/tuples more easily. In this case, === and Object.is do the same thing.

I think that we should consider option 4; encouraging Object.is usage seems like the wrong approach, and silently canonicalizing -0 also seems like a massive foot gun. I'd love to hear thoughts from others.

devsnek commented 3 years ago

I would consider 0 and -0 not being === to also be a massive footgun.

rickbutton commented 3 years ago

@devsnek I agree that it is weird, although this doesn't seem to be a choice between "which solution doesn't have any foot-guns" but "which foot-gun do we think is the smallest and is the best/least worst option".

Zarel commented 3 years ago

I was hoping that after this much discussion, it was clear that #[0] !== #[-0] was the biggest footgun and the worst option, and I think many people here feel the same.

I think we also did establish that no other language considers 0 and -0 unequal, in or out of tuples.

My preferred option is still "=== all the way down" but I'd be reasonably happy with any option that enforces #[0] === #[-0].

iwikal commented 3 years ago
1. `#[0] === #[-0]` and `===` is generally expected to be a linear time operation for records/tuples. In this case, engines could intern values via `Object.is` since it uses `SameValue` and `Object.is(#[0], #[-0])` is `false`.

I don't think we need to go as far as to say linear time in this case. An engine could intern the tuples for storage à la Object.is, and then have a separate data structure that maps these identities to sets of identities that === one another. This would be the fallback if the identity comparison fails, and makes the worst case take n log n, where n is the number of currently living unique tuples. Of course, a linear comparison will be faster for small tuples, especially if there's a lot of tuples around.

Also, one could potentially mark the values (or the identities) with a flag that means "contains -0", and if both don't, there is no need to check this map, we immediately know they must be unequal. This makes the best case time much more likely in applications where -0 is unlikely or not used at all.

Edit: I wrote this before reading rickbutton's original comment very carefully. The suggestion to simply have these values carry two pointers, one for identity and one for equality, did not occur to me and actually seems easier. But I disagree that it would cancel out the benefits of interning. That comment seems to suggest that we do the interning just for the sake of equality, and keep a pointer to every constructed tuple around, but that seems unnecessary. We can do it the other way around: intern for the sake of memory usage, and have some additional equality marker.

littledan commented 3 years ago

The issue here is not just "complexity" or whether something is "worth the effort", but a permanent performance cost for all Record and Tuple usage, where implementations have to incur the cost for one of the following:

To avoid these real costs that affect R&T users (not just implementers), we could adopt @rickbutton 's option 3 or 4. Both "work", but there's been strong opposition to each voiced above. I'm sympathetic to the goals that led to the desire for the semantics of the current specification, but we have to balance this against the costs to find the best tradeoff for JS developers.

I initially thought that alternatives 1 and 2 could be promising (possibly with a built-in intern operation for 2), but @rricard convinced me that these would both come at too high of a cost: they would mean that we are telling developers, again, "you have to think about which equality operation you want to use; consider using Object.is or Object.is(intern(x), intern(y)) for better semantics/performance".

But the idea of this proposal is that deep structural equality is an everyday feature integrated with the standard library and ecosystem, not an advanced expert feature. Requiring special incantations or libraries to check for structural equality without suddenly creating a slowdown jeopardizes the core value proposition of this proposal.

bakkot commented 3 years ago

Given those tradeoffs, canonicalization seems like the best option to me.

rickbutton commented 3 years ago

not being worth the effort

Realized that I should clarify this, in that I don't mean that it isn't "worth the effort of implementation" from a staffing the work / engine complexity perspective, but more specifically that the complexity would add a performance cost that could impact any performance gains. The latter could imply the former of course.

waldemarhorwat commented 3 years ago

User-visible behaviors of options 3 and 4 are non-starters to a variety of folks, for diverse reasons that we've rehashed many times already. See past discussions. I don't see going around in circles reopening those as being the most productive.

There are good suggestions above for making options 1 and 2 more efficient.

Zarel commented 3 years ago

I agree that option 4 (#[-0] !== #[0]) has been rehashed over and over again and am confused why it's being suggested again when the majority of comments in this issue are about why it's unacceptable.

What's wrong with option 3 (converting -0 to 0 when putting values in tuples), though?

devsnek commented 3 years ago

read above but tl;dr it breaks when very small negative numbers round to -0

bakkot commented 3 years ago

@devsnek I've read above and I don't know what you're referring to. The only comment about that is this one, which is by @Zarel, the person you're responding to.

devsnek commented 3 years ago

hmm I thought I posted about this too. well basically when you get large/small enough numbers you start getting zeros and infinities, and they retain the sign of the value they represent. it's definitely an edge case but if you happen to use one of these datastructures while doing math you could end up with a result with the wrong sign.

bakkot commented 3 years ago

If you underflow to negative zero you've ended up with a result with the wrong value, not just the wrong sign, so it's not clear to me why that's a vital use case.

There is some high-performance numerics code which is written carefully to be correct-enough in the presence of underflow and which depends on these IEEE quirks, but it seems weird to say that we should sacrifice performance for everyone for the benefit of high-performance numerics code, especially given that such code, definitionally, also cares a lot about performance.

Zarel commented 3 years ago

Oh, yeah, I do realize it has effects such as the example I gave (which was in response to a question about if it could cause problems at all).

I'm just less opinionated on that than on #[-0] !== #[0], which would be disastrous and frequently cause bugs in real-world code. Canonicalization in comparison only affects some obscure use cases. I'd think of it like #[NaN] === #[NaN], unpleasant additional complexity, but at least it only affects use cases that already have to deal with lots of this kind of complexity.

Maxdamantus commented 2 years ago

I'm not necessarily advocating for this, but I think there's a 5th option that noone's brought up yet, which is a hybrid of options 3 and 4: canonicalizing -0 to 0 by default, but allowing people to construct R/T values that do not canonicalize:

Object.is(#[-0], #[0]); // true
Object.is(#![-0], #![0]); // false
Object.is(#![-0], #[0]); // false
Object.is(#![0], #[0]); // true
Object.is(#[#![-0]], #[#![0]]); // probably false?

The value of the last example depends on whether canonicalization is recursive. I suspect it shouldn't be, just for simplicity of semantics. If this opt-out canonicalization exists, it's basically just there so that people who actually do depend on 1/-0 < 0 are not excluded from R/T, and presumably those people are aware enough of IEEE-754 semantics to be careful when handling the #![-0] !== #![0] footgun.

bakkot commented 2 years ago

I definitely do not think it is worth having special syntax just for -0.

rricard commented 2 years ago

The current spec defines the following:

Object.is(#[-0], #[0]); // false
#[-0] === #[0] // true
// similar to:
Object.is(-0, 0); // false
-0 ===0; // true

Object.is(#[NaN], #[NaN]); // true
#[NaN] === #[NaN] // true
// unlike:
Object.is(NaN, NaN); // true
NaN  === NaN; // false

Record & Tuple being primitives, they should be able to contain any other primitive value including -0 and NaN:

We are going to go ahead with this decision for Stage 3

hax commented 2 years ago

I strongly dislike #[NaN] === #[NaN], this invent another new special rule of NaN. It seems no other mainstream programming language's value types have such thing.

Maxdamantus commented 2 years ago

I strongly dislike #[NaN] === #[NaN], this invent another new special rule of NaN. It seems no other mainstream programming language's value types have such thing.

Java does in its record classes:

jshell> record Foo(double a) {}
|  created record Foo

jshell> Objects.equals(new Foo(Double.NaN), new Foo(Double.NaN))
$2 ==> true

Though it also returns false for Objects.equals(new Foo(0.0), new Foo(-0.0)), which I don't like. Probably worth noting however that this last point is less impactful than in JavaScript, since -0 (the integer) in Java is the same as 0.

acutmore commented 2 years ago

From a different perspective this avoids introducing more special cases for NaN.

const isNaN = (x) => x !== x;

NaN being the only value to not equal itself is preserved by #[NaN] being equal to itself.

Using SameValueZero is a common operation across other APIs:

[NaN].includes(NaN) // true
new Set([NaN]).has(NaN) // true
[#[NaN]].includes(#[NaN]) // true
new Set([#[NaN]]).has(#[NaN]) // true
hax commented 2 years ago

@Maxdamantus Java Objects.equals(Double.NaN, Double.NaN) is like JS Object.is, so it should give true (and false for ±0).

hax commented 2 years ago

@acutmore I don't think isNaN polyfill issue have the high priority (old code never have tuple/record), and for new code it should always use Number.isNaN (and it's easy to fix it by adding typeof check). And consider if we have Decimal.NaN in the future, u will also meet similar issue.

Existence check is another issue, and I'm ok with it, and we should modify SameValueZero for that , but we shouldn't modify ===, break the recursive property.

nicolo-ribaudo commented 2 years ago

How does using SameValueZero for R&T === break the transitive property?

hax commented 2 years ago

@nicolo-ribaudo Sorry, I mean recursive property, not transitive.

Maxdamantus commented 2 years ago

Objects.equals just calls the .equals method on the object, so it's a custom equality operation. Java doesn't really have something corresponding to Object.is in JS. == in Java is used to compare values as === is in JS (with the special cases for IEEE-754 -0.0 and NaN values).

As @acutmore alluded to, arguably the important thing is the notion of equality used in other operations such as map indexing, which in Java is Object.equals:

Set.of(Double.NaN).contains(Double.NaN) // true
Object.equals(-0.0, 0.0) // false
Set.of(-0.0).contains(0.0) // false

and in JavaScript is SameValueZero:

new Set([NaN]).has(NaN) // true
-0.0 === 0.0 // true
new Set([-0.0]).has(0.0) // true

This is as opposed to eg, Haskell, which actually does maintain IEEE-754 semantics in collections:

let nan = 0.0/0.0
Set.member nan (Set.fromList [nan]) -- False
Set.member (-0.0) (Set.fromList [0.0]) -- True
Set.fromList [nan, nan] -- fromList [NaN,NaN]
Set.fromList [0.0, -0.0] -- fromList [-0.0]
Set.fromList [-0.0, 0.0] -- fromList [0.0]

I actually prefer the Haskell semantics for consistency, even though it does have some potentially surprising cases above, though Map/Set don't work this way in JS or Java.

hax commented 2 years ago

@Maxdamantus Java and JS use different name, but JS === is like Java == and JS Object.is like Java Objects.equals. Of coz they have differences, because JS do not have class defined equals behavior.

And JS do not use === for existence check, use samevaluezero. If follow Java, it should use Object.is. If follow haskell, it should use === (if i understand correctly).

Programming languages have differences on existence check, it may be ok, because they could have different definition for "key" and "existence", but I think programmers would like to have the consistent == (in JS ===) definition across the languages.

Maxdamantus commented 2 years ago

@Maxdamantus Java and JS use different name, but JS === is like Java == and JS Object.is like Java Objects.equals. Of coz they have differences, because JS do not have class defined equals behavior.

JavaScript's Object.is and Java's Objects.equals are only alike in that they compare doubles (actually, Double objects) by content. In other cases they are usually different, since for collections they also tend to compare content:

var a = new ArrayList<Integer>();
var b = new ArrayList<Integer>();
Objects.equals(a, b); // true
a.add(42);
Objects.equals(a, b); // false

In JavaScript, Object.is says whether its operands are the same value, which is not what's happening above (a and b are different reference values, because operating on one is not the same as operating on the other).

And JS do not use === for existence check, use samevaluezero. If follow Java, it should use Object.is. (Note, you have a type, that new Set([-0.0]).has(0.0) give u true). If follow haskell, it should use === (if i understand correctly).

Sorry, I messed that example up. I guess the point might be that there's already an inconsistency in JS, so it's not clear that one way is better than the other, which is probably why I don't feel strongly on one side or the other. As I was saying, personally I think it would have been cleaner if they had made Set and Map use === for comparison, like Haskell does (though I don't know if this would have caused other issues in the language).

hax commented 2 years ago

@Maxdamantus Java ArrayList.equals works just because ArrayList override equals, if not override, it give false. This is what I mean "because JS do not have class defined equals behavior."

I guess the point might be that there's already an inconsistency in JS

The "inconsistency" we are talking is JS actually have some basic "consistency", that is using SameValueZero for existence check, so I would like we can keep that "consistency" of "inconsistency", not introduce another level inconsistency to ===.

hax commented 2 years ago

Another question, [...new Set([-0])][0] give u +0 (-0 is converted to +0), what's the result of [...new Set([#[-0]])][0][0] ? +0 or -0? I suppose it's -0?

rickbutton commented 2 years ago

@hax https://tc39.es/ecma262/#sec-set.prototype.add

~Set.prototype.add always converts -0 to +0, the Set constructor uses Set.prototype.add. Both cases will be +0.~

nicolo-ribaudo commented 2 years ago

@rickbutton I think you missed some parentheses: the second case is -0, because it's calling .add(#[-0]) and thus it's not doing any conversion.

rickbutton commented 2 years ago

Good point, ignore me.

papb commented 2 years ago

I was wondering: was the option of raising an error upon an attempt of putting NaN into a record/tuple considered?

I see that @rricard's comment says:

Record & Tuple being primitives, they should be able to contain any other primitive value including -0 and NaN

...but I was hoping for more details. Why does "being a primitive" imply "being able to contain any other primitive"? As I see, Record & Tuple are the first kind of primitives that contain multiple values, so no precedent seems to exist for this...

ljharb commented 2 years ago

I'd put it another way: NaN and -0 are primitives, and shouldn't be "special" by being specifically excluded. Doing so would be very confusing and weird.

papb commented 2 years ago

I was talking specifically about NaN, in the sense that it basically represents an error state (as said by @icefoxen), so this might be an opportunity to turn it into a real error.

By specifying that Records and Tuples with NaN can't even exist, we no longer have to deal with the paradoxical behavior it brings when treated as a value (namely, that it makes equality non-reflexive).

Whichever decision is made for NaN, one of the cases above would not be pleased. So what about the option of making these situations impossible?

acutmore commented 2 years ago

Hi @papb thanks for the idea, I think you’re right in that that rejecting NaN may not have been considered before.

I do think that @rricard ’s comment you quoted is a strong design goal for the proposal. “[R&T] should be able to contain any other primitive value”. While they do reject object values due to the help catch accidental introduction of mutability or identity or both. Rejecting NaN because it might 'represent an error' sounds outside the responsibility of the container.

acutmore commented 2 years ago

As I see, Record & Tuple are the first kind of primitives that contain multiple values, so no precedent seems to exist for this...

While that may be true. There is already precedent in the language for how equality of NaN behaves within a container. [NaN].includes(NaN) === true

erights commented 2 years ago

Normally, the objection "but that would be weird" (however phrased) is properly taken to be a strong objection, because it indicates the feature would violate the principle of least surprise. For -0 and NaN within Records and Tuples, all of the choices anyone has yet invented are weird in this sense. The objection is not strong if it also applies to all alternatives.

ljharb commented 2 years ago

@erights while i agree with that, rejecting NaN from a container is much, much weirder than giving a surprising equality answer.

erights commented 2 years ago

Hi @ljharb I do not disagree. Nevertheless, I felt it worth making the meta point. Thanks.

pygy commented 2 years ago

Both are weird, but rejecting them at construction time has the advantage of making the surprising behavior obvious, while having unusual equality semantics can lead to subtle bugs.

erights commented 2 years ago

@pygy that's a good point. The principle of least surprise must be understood in terms of the dangerous consequences of a surprise:

On these grounds, a dynamic early rejection of placing problematic values into R&T is clearly less dangerous than having a weird equality semantics, where the surprise case still returns a boolean rather than throwing.

Despite this, both @ljharb and I agree above to take the hit on the weird equality semantics (silent divergence) rather than the dynamic early weird value rejection (reliable throw). It is still an overall tradeoff, but I appreciate the logic of this counter-argument. We should still take it seriously, even if we feel that other factors overrule it in this case.

papb commented 2 years ago

@erights That's a superb way of putting it! Thank you very much!

Despite this, both @ljharb and I agree above to take the hit on the weird equality semantics (silent divergence) rather than the dynamic early weird value rejection (reliable throw).

Can you please clarify why?

even if we feel that other factors overrule it in this case

What factors?

Note: I know that there are already multiple arguments within this long issue, but... To me, everything you said in the last comment is a very strong argument in favor of opting for the early throw... I would love to see how exactly you're still capable of disagreeing with such strong argument (written by yourself 😅)