Closed bakkot closed 2 years ago
To be clear, it's reflexive but only they are same NaN, if there is another NaN, they don't equal.
Oh they have a more weird semantics
Oh they have a more weird semantics
NaN is always weird ^_^ It's hard to say which behavior is much weird.
we in general don't have the semantics of having multiple different NaN values
@littledan Yes I know. I just try to explain the different behavior in different programming languages, especially there are misunderstanding of them and use them as the proof of which semantic we should adopt.
But I think this change would be orthogonal to Records and Tuples.
There are several contradict requirements on the behavior of NaN in tuple/records, the interesting thing is Python/Ruby behavior seems satisfy many of them (though it rely on multiple NaN values). So at least it would be a motivation of refining NaN semantic. Another motivation may be Math.signbit(NaN)
.
I tested C#.
For C# value tuple, (NaN, 1) ==(NaN, 1)
always returns false
, and (NaN, 1).Equals(NaN, 1)
always returns true
.
For C# System.tuple (object), same result except var x = Tuple.create(NaN, 1);x == x
returns true
(because it's object so same reference always true).
I will test other languages if I have time.
Swift tested.
Have very same result as C#, tuple behave same as C# value tuple, Array/Set behave same as C# System.tuple. (But Swift only have ==
, no Equal
method)
We just had a meeting with some V8 folks about performance, and I wanted to document some thoughts re: SameValueZero comparison on this issue thread.
Specifically, the thinking is that using "recursive SameValueZero" comparison for ===
for Record/Tuple might make equality checking complex. If ===
produced reliable comparison for "observably equal" Records/Tuples (i.e. what Object.is does), then interning becomes more straightforward (since you would be able to swap one of the equality operands out for the other without managing a "set" of things that are equal, some with -0 and some with +0). This complexity doesn't strictly prevent us from choosing SameValueZero, but does certainly increase engine complexity for engines that wish do to interning or other equality optimizations.
I'm comfortable saying that this justifies increased complexity.
I mentioned this briefly in my previous comment, but I want to elaborate on this after some more discussion with implementers:
If "recursive SameValueZero" is used when comparing Records/Tuples, interning becomes significantly more complicated, to the point of not being worth the effort. Even though #[0] === #[-0]
, the engine must obviously keep track of both tuples, which means that even if they are interned, the engine must maintain a separate pointer to the data (or some other method of keeping track of the actual value), which diminishes the benefits of interning.
There are a few possible choices here, to recap:
#[0] === #[-0]
and ===
is generally expected to be a linear time operation for records/tuples. In this case, engines could intern values via Object.is
since it uses SameValue
and Object.is(#[0], #[-0])
is false
.#[0] === #[-0]
and ===
is generally expected to be a linear time operation for records/tuples. Additionally Object.is
compares Record/Tuple identity (as in, they are instead just objects) which allows interning to be implemented in userland.-0
to 0
, in which case this isn't a problem.#[0] !== #[-0]
and engines are free to implement interning records/tuples more easily. In this case, ===
and Object.is
do the same thing.I think that we should consider option 4; encouraging Object.is
usage seems like the wrong approach, and silently canonicalizing -0
also seems like a massive foot gun. I'd love to hear thoughts from others.
I would consider 0
and -0
not being ===
to also be a massive footgun.
@devsnek I agree that it is weird, although this doesn't seem to be a choice between "which solution doesn't have any foot-guns" but "which foot-gun do we think is the smallest and is the best/least worst option".
I was hoping that after this much discussion, it was clear that #[0] !== #[-0]
was the biggest footgun and the worst option, and I think many people here feel the same.
I think we also did establish that no other language considers 0
and -0
unequal, in or out of tuples.
My preferred option is still "===
all the way down" but I'd be reasonably happy with any option that enforces #[0] === #[-0]
.
1. `#[0] === #[-0]` and `===` is generally expected to be a linear time operation for records/tuples. In this case, engines could intern values via `Object.is` since it uses `SameValue` and `Object.is(#[0], #[-0])` is `false`.
I don't think we need to go as far as to say linear time in this case. An engine could intern the tuples for storage à la Object.is
, and then have a separate data structure that maps these identities to sets of identities that ===
one another. This would be the fallback if the identity comparison fails, and makes the worst case take n log n
, where n
is the number of currently living unique tuples. Of course, a linear comparison will be faster for small tuples, especially if there's a lot of tuples around.
Also, one could potentially mark the values (or the identities) with a flag that means "contains -0", and if both don't, there is no need to check this map, we immediately know they must be unequal. This makes the best case time much more likely in applications where -0 is unlikely or not used at all.
Edit: I wrote this before reading rickbutton's original comment very carefully. The suggestion to simply have these values carry two pointers, one for identity and one for equality, did not occur to me and actually seems easier. But I disagree that it would cancel out the benefits of interning. That comment seems to suggest that we do the interning just for the sake of equality, and keep a pointer to every constructed tuple around, but that seems unnecessary. We can do it the other way around: intern for the sake of memory usage, and have some additional equality marker.
The issue here is not just "complexity" or whether something is "worth the effort", but a permanent performance cost for all Record and Tuple usage, where implementations have to incur the cost for one of the following:
To avoid these real costs that affect R&T users (not just implementers), we could adopt @rickbutton 's option 3 or 4. Both "work", but there's been strong opposition to each voiced above. I'm sympathetic to the goals that led to the desire for the semantics of the current specification, but we have to balance this against the costs to find the best tradeoff for JS developers.
I initially thought that alternatives 1 and 2 could be promising (possibly with a built-in intern operation for 2), but @rricard convinced me that these would both come at too high of a cost: they would mean that we are telling developers, again, "you have to think about which equality operation you want to use; consider using Object.is
or Object.is(intern(x), intern(y))
for better semantics/performance".
But the idea of this proposal is that deep structural equality is an everyday feature integrated with the standard library and ecosystem, not an advanced expert feature. Requiring special incantations or libraries to check for structural equality without suddenly creating a slowdown jeopardizes the core value proposition of this proposal.
Given those tradeoffs, canonicalization seems like the best option to me.
not being worth the effort
Realized that I should clarify this, in that I don't mean that it isn't "worth the effort of implementation" from a staffing the work / engine complexity perspective, but more specifically that the complexity would add a performance cost that could impact any performance gains. The latter could imply the former of course.
User-visible behaviors of options 3 and 4 are non-starters to a variety of folks, for diverse reasons that we've rehashed many times already. See past discussions. I don't see going around in circles reopening those as being the most productive.
There are good suggestions above for making options 1 and 2 more efficient.
I agree that option 4 (#[-0] !== #[0]
) has been rehashed over and over again and am confused why it's being suggested again when the majority of comments in this issue are about why it's unacceptable.
What's wrong with option 3 (converting -0
to 0
when putting values in tuples), though?
read above but tl;dr it breaks when very small negative numbers round to -0
@devsnek I've read above and I don't know what you're referring to. The only comment about that is this one, which is by @Zarel, the person you're responding to.
hmm I thought I posted about this too. well basically when you get large/small enough numbers you start getting zeros and infinities, and they retain the sign of the value they represent. it's definitely an edge case but if you happen to use one of these datastructures while doing math you could end up with a result with the wrong sign.
If you underflow to negative zero you've ended up with a result with the wrong value, not just the wrong sign, so it's not clear to me why that's a vital use case.
There is some high-performance numerics code which is written carefully to be correct-enough in the presence of underflow and which depends on these IEEE quirks, but it seems weird to say that we should sacrifice performance for everyone for the benefit of high-performance numerics code, especially given that such code, definitionally, also cares a lot about performance.
Oh, yeah, I do realize it has effects such as the example I gave (which was in response to a question about if it could cause problems at all).
I'm just less opinionated on that than on #[-0] !== #[0]
, which would be disastrous and frequently cause bugs in real-world code. Canonicalization in comparison only affects some obscure use cases. I'd think of it like #[NaN] === #[NaN]
, unpleasant additional complexity, but at least it only affects use cases that already have to deal with lots of this kind of complexity.
I'm not necessarily advocating for this, but I think there's a 5th option that noone's brought up yet, which is a hybrid of options 3 and 4: canonicalizing -0
to 0
by default, but allowing people to construct R/T values that do not canonicalize:
Object.is(#[-0], #[0]); // true
Object.is(#![-0], #![0]); // false
Object.is(#![-0], #[0]); // false
Object.is(#![0], #[0]); // true
Object.is(#[#![-0]], #[#![0]]); // probably false?
The value of the last example depends on whether canonicalization is recursive. I suspect it shouldn't be, just for simplicity of semantics. If this opt-out canonicalization exists, it's basically just there so that people who actually do depend on 1/-0 < 0
are not excluded from R/T, and presumably those people are aware enough of IEEE-754 semantics to be careful when handling the #![-0] !== #![0]
footgun.
I definitely do not think it is worth having special syntax just for -0
.
The current spec defines the following:
Object.is(#[-0], #[0]); // false
#[-0] === #[0] // true
// similar to:
Object.is(-0, 0); // false
-0 ===0; // true
Object.is(#[NaN], #[NaN]); // true
#[NaN] === #[NaN] // true
// unlike:
Object.is(NaN, NaN); // true
NaN === NaN; // false
Record & Tuple being primitives, they should be able to contain any other primitive value including -0
and NaN
:
-0
: we want to replicate the way the value is currently handled out of other structuresNaN
: we want to avoid NaN preventing a Record & Tuple structure to ever be equal to another and even itselfWe are going to go ahead with this decision for Stage 3
I strongly dislike #[NaN] === #[NaN]
, this invent another new special rule of NaN
. It seems no other mainstream programming language's value types have such thing.
I strongly dislike
#[NaN] === #[NaN]
, this invent another new special rule ofNaN
. It seems no other mainstream programming language's value types have such thing.
Java does in its record classes:
jshell> record Foo(double a) {}
| created record Foo
jshell> Objects.equals(new Foo(Double.NaN), new Foo(Double.NaN))
$2 ==> true
Though it also returns false for Objects.equals(new Foo(0.0), new Foo(-0.0))
, which I don't like. Probably worth noting however that this last point is less impactful than in JavaScript, since -0
(the integer) in Java is the same as 0
.
From a different perspective this avoids introducing more special cases for NaN
.
const isNaN = (x) => x !== x;
NaN
being the only value to not equal itself is preserved by #[NaN]
being equal to itself.
Using SameValueZero is a common operation across other APIs:
[NaN].includes(NaN) // true
new Set([NaN]).has(NaN) // true
[#[NaN]].includes(#[NaN]) // true
new Set([#[NaN]]).has(#[NaN]) // true
@Maxdamantus Java Objects.equals(Double.NaN, Double.NaN)
is like JS Object.is
, so it should give true
(and false
for ±0).
@acutmore I don't think isNaN
polyfill issue have the high priority (old code never have tuple/record), and for new code it should always use Number.isNaN
(and it's easy to fix it by adding typeof
check). And consider if we have Decimal.NaN
in the future, u will also meet similar issue.
Existence check is another issue, and I'm ok with it, and we should modify SameValueZero for that , but we shouldn't modify ===
, break the recursive property.
How does using SameValueZero
for R&T ===
break the transitive property?
@nicolo-ribaudo Sorry, I mean recursive property, not transitive.
Objects.equals
just calls the .equals
method on the object, so it's a custom equality operation. Java doesn't really have something corresponding to Object.is
in JS. ==
in Java is used to compare values as ===
is in JS (with the special cases for IEEE-754 -0.0
and NaN
values).
As @acutmore alluded to, arguably the important thing is the notion of equality used in other operations such as map indexing, which in Java is Object.equals
:
Set.of(Double.NaN).contains(Double.NaN) // true
Object.equals(-0.0, 0.0) // false
Set.of(-0.0).contains(0.0) // false
and in JavaScript is SameValueZero
:
new Set([NaN]).has(NaN) // true
-0.0 === 0.0 // true
new Set([-0.0]).has(0.0) // true
This is as opposed to eg, Haskell, which actually does maintain IEEE-754 semantics in collections:
let nan = 0.0/0.0
Set.member nan (Set.fromList [nan]) -- False
Set.member (-0.0) (Set.fromList [0.0]) -- True
Set.fromList [nan, nan] -- fromList [NaN,NaN]
Set.fromList [0.0, -0.0] -- fromList [-0.0]
Set.fromList [-0.0, 0.0] -- fromList [0.0]
I actually prefer the Haskell semantics for consistency, even though it does have some potentially surprising cases above, though Map
/Set
don't work this way in JS or Java.
@Maxdamantus Java and JS use different name, but JS ===
is like Java ==
and JS Object.is
like Java Objects.equals
. Of coz they have differences, because JS do not have class defined equals
behavior.
And JS do not use ===
for existence check, use samevaluezero. If follow Java, it should use Object.is
. If follow haskell, it should use ===
(if i understand correctly).
Programming languages have differences on existence check, it may be ok, because they could have different definition for "key" and "existence", but I think programmers would like to have the consistent ==
(in JS ===
) definition across the languages.
@Maxdamantus Java and JS use different name, but JS
===
is like Java==
and JSObject.is
like JavaObjects.equals
. Of coz they have differences, because JS do not have class definedequals
behavior.
JavaScript's Object.is
and Java's Objects.equals
are only alike in that they compare doubles (actually, Double
objects) by content. In other cases they are usually different, since for collections they also tend to compare content:
var a = new ArrayList<Integer>();
var b = new ArrayList<Integer>();
Objects.equals(a, b); // true
a.add(42);
Objects.equals(a, b); // false
In JavaScript, Object.is
says whether its operands are the same value, which is not what's happening above (a
and b
are different reference values, because operating on one is not the same as operating on the other).
And JS do not use
===
for existence check, use samevaluezero. If follow Java, it should useObject.is
. (Note, you have a type, thatnew Set([-0.0]).has(0.0)
give u true). If follow haskell, it should use===
(if i understand correctly).
Sorry, I messed that example up. I guess the point might be that there's already an inconsistency in JS, so it's not clear that one way is better than the other, which is probably why I don't feel strongly on one side or the other. As I was saying, personally I think it would have been cleaner if they had made Set
and Map
use ===
for comparison, like Haskell does (though I don't know if this would have caused other issues in the language).
@Maxdamantus Java ArrayList.equals
works just because ArrayList
override equals
, if not override, it give false
. This is what I mean "because JS do not have class defined equals
behavior."
I guess the point might be that there's already an inconsistency in JS
The "inconsistency" we are talking is JS actually have some basic "consistency", that is using SameValueZero for existence check, so I would like we can keep that "consistency" of "inconsistency", not introduce another level inconsistency to ===
.
Another question, [...new Set([-0])][0]
give u +0
(-0 is converted to +0), what's the result of [...new Set([#[-0]])][0][0]
? +0 or -0? I suppose it's -0?
@hax https://tc39.es/ecma262/#sec-set.prototype.add
~Set.prototype.add
always converts -0
to +0
, the Set constructor uses Set.prototype.add
. Both cases will be +0
.~
@rickbutton I think you missed some parentheses: the second case is -0
, because it's calling .add(#[-0])
and thus it's not doing any conversion.
Good point, ignore me.
I was wondering: was the option of raising an error upon an attempt of putting NaN
into a record/tuple considered?
I see that @rricard's comment says:
Record & Tuple being primitives, they should be able to contain any other primitive value including
-0
andNaN
...but I was hoping for more details. Why does "being a primitive" imply "being able to contain any other primitive"? As I see, Record & Tuple are the first kind of primitives that contain multiple values, so no precedent seems to exist for this...
I'd put it another way: NaN
and -0
are primitives, and shouldn't be "special" by being specifically excluded. Doing so would be very confusing and weird.
I was talking specifically about NaN
, in the sense that it basically represents an error state (as said by @icefoxen), so this might be an opportunity to turn it into a real error.
By specifying that Records and Tuples with NaN
can't even exist, we no longer have to deal with the paradoxical behavior it brings when treated as a value (namely, that it makes equality non-reflexive).
Case 1: tuple being used as a point
const point1 = #[0, Math.sqrt(-3)]
const point2 = #[0, Math.sqrt(-2)]
point1 === point2 // should be false, for the same reason that Math.sqrt(-3) !== Math.sqrt(-2)
Case 2: large record with lots of nested records
NaN
value.Whichever decision is made for NaN
, one of the cases above would not be pleased. So what about the option of making these situations impossible?
Hi @papb thanks for the idea, I think you’re right in that that rejecting NaN may not have been considered before.
I do think that @rricard ’s comment you quoted is a strong design goal for the proposal. “[R&T] should be able to contain any other primitive value”. While they do reject object values due to the help catch accidental introduction of mutability or identity or both. Rejecting NaN
because it might 'represent an error' sounds outside the responsibility of the container.
As I see, Record & Tuple are the first kind of primitives that contain multiple values, so no precedent seems to exist for this...
While that may be true. There is already precedent in the language for how equality of NaN
behaves within a container. [NaN].includes(NaN) === true
Normally, the objection "but that would be weird" (however phrased) is properly taken to be a strong objection, because it indicates the feature would violate the principle of least surprise. For -0 and NaN within Records and Tuples, all of the choices anyone has yet invented are weird in this sense. The objection is not strong if it also applies to all alternatives.
@erights while i agree with that, rejecting NaN from a container is much, much weirder than giving a surprising equality answer.
Hi @ljharb I do not disagree. Nevertheless, I felt it worth making the meta point. Thanks.
Both are weird, but rejecting them at construction time has the advantage of making the surprising behavior obvious, while having unusual equality semantics can lead to subtle bugs.
@pygy that's a good point. The principle of least surprise must be understood in terms of the dangerous consequences of a surprise:
On these grounds, a dynamic early rejection of placing problematic values into R&T is clearly less dangerous than having a weird equality semantics, where the surprise case still returns a boolean rather than throwing.
Despite this, both @ljharb and I agree above to take the hit on the weird equality semantics (silent divergence) rather than the dynamic early weird value rejection (reliable throw). It is still an overall tradeoff, but I appreciate the logic of this counter-argument. We should still take it seriously, even if we feel that other factors overrule it in this case.
@erights That's a superb way of putting it! Thank you very much!
Despite this, both @ljharb and I agree above to take the hit on the weird equality semantics (silent divergence) rather than the dynamic early weird value rejection (reliable throw).
Can you please clarify why?
even if we feel that other factors overrule it in this case
What factors?
Note: I know that there are already multiple arguments within this long issue, but... To me, everything you said in the last comment is a very strong argument in favor of opting for the early throw... I would love to see how exactly you're still capable of disagreeing with such strong argument (written by yourself 😅)
What should each of the following evaluate to?
(For context, this is non-obvious because
+0 === -0
istrue
,Object.is(+0, -0)
isfalse
,NaN === NaN
isfalse
, andObject.is(NaN, NaN)
istrue
.)Personally I lean towards the
-0
cases all beingfalse
and theNaN
cases all beingtrue
, so that the unusual equality semantics of-0
andNaN
do not propagate to the new kinds of objects being introduced by this proposal.