tc39 / proposal-Math.signbit

Math.signbit
9 stars 6 forks source link

How better handle negative NaNs? #1

Open MaxGraey opened 5 years ago

MaxGraey commented 5 years ago

I think need better clarify how handle negative NaNs. Most of implementations in built-ins of LLVM, GCC, Go and Rust use non-sign agnostic for NaNs like:

signbit(+NaN) == false // +NaN => 0x7ff80000_00000000
signbit(-NaN) == true  // -NaN => 0xfff80000_00000000

But in spec this not strictly mentioned and it seems we need always handle signed and unsigned NaNs as false?

Relate to this discussion

chicoxyzzy commented 5 years ago

-NaN is actually evaluates to NaN in JS

> function ident(n) {return n}
< undefined
> ident(-0)
< -0
> ident(-NaN)
< NaN

so I suppose it should be handled as NaN

MaxGraey commented 5 years ago
const F64 = new Float64Array(1);
const U64 = new Uint32Array(F64.buffer);

F64[0] = NaN;
console.log('0x' + U64[1].toString(16));

F64[0] =-NaN;
console.log('0x' + U64[1].toString(16));

> 0x7ff80000
> 0xfff80000
chicoxyzzy commented 5 years ago

There is no negative NaN in spec though

distinct “Not-a-Number” values of the IEEE Standard are represented in ECMAScript as a single special NaN value

https://tc39.github.io/ecma262/#sec-ecmascript-language-types-number-type

hax commented 4 years ago

I found that all engines are actually have different raw bits for NaN and -NaN.

For example chakra implement it in https://github.com/microsoft/ChakraCore/pull/5905 .

And it seems Chrome recently also implement it (version 79+) though I have no time to find the original PR.

hax commented 4 years ago
// use TypedArray to expose the sign bit
// note this also use the coercion `ToNumber` semantic
Math.signbit = (() => {
    const LE = new Uint8Array(new Uint16Array([1]).buffer)[0]
    return function signbit(n) {
        const f64 = new Float64Array([n])
        const i32 = new Uint32Array(f64.buffer)
        return (i32[LE] >>> 31) === 1
    }
})()

console.log(Math.signbit(0))
console.log(Math.signbit(-0))
console.log(Math.signbit(Infinity))
console.log(Math.signbit(-Infinity))
console.log(Math.signbit(NaN))
console.log(Math.signbit(-NaN))
console.log(Math.signbit(-(-NaN)))
const negNaN = Number.POSITIVE_INFINITY / Number.NEGATIVE_INFINITY
console.log(Math.signbit(negNaN))

Note all tests are run on my MacBook Air (macOS High Sierra 10.13.6, Intel Core i5)

MaxGraey commented 4 years ago

Interesting. Btw you could use simpler approach because JS should use LE for x84:

const F64 = new Float64Array(1);
const U64 = new Uint32Array(F64.buffer);

const signbit = x => (F64[0] = x, Boolean(U64[1] >>> 31));
ghost commented 3 years ago

I came along and was wondering why special casing was made for NaNs too.

It wouldn't act like C's signbit at all then, but according to @chicoxyzzy, JS doesn't have a negative NaN.

If it isn't possible to create/use a NaN with an arbitrary bitset, then wouldn't one be able to use the bit manipulation implementations that most other languages use for signbit, without special casing NaNs, relying on the JS VM to canonicalize the NaN upon writing/reading/serializing it?

hax commented 3 years ago

JS doesn't have a negative NaN.

I think as my previous tests, engines actually have negative NaNs, currently it could be treated as abstract leak of implementation details in some degree, but if introduce signbit, I suppose it should reflect them as is.

ljharb commented 3 years ago

Exposing the bit patterns of NaN is a massive mistake in Typed Arrays, and one we should not extend anywhere else. Math.signbit should, like every non-Typed-Array part of the language, canonicalize NaNs and not distinguish between any bit patterns of any implementation's NaN values.

ghost commented 3 years ago

Exposing the bit patterns of NaN is a massive mistake in Typed Arrays

If I may ask, why? NaN is just as much of a number as 53.5 is, as 8 is, as 0 is, as -0 is, as infinity is, etc, as least according to IEEE 754 semantics and rules. All of them have a hard bit-pattern, and because TypedArrays expose any of them, I'd argue that they should all be exposed.

Maybe... just maybe, the language spec should be changed to reflect modern implementations, and have different NaNs?

ljharb commented 3 years ago

@CrimsonCodes0 because in JS, explicitly and intentionally, there is supposed to only be one observable NaN value.

Typed Arrays expose them because the implementations that led to them didn't canonicalize. That doesn't mean it's a good decision.

Nothing should ever be added to the language that widens this unfortunate exposure.

MaxGraey commented 3 years ago

I think as my previous tests, engines actually have negative NaNs, currently it could be treated as abstract leak of implementation details in some degree, but if introduce signbit, I suppose it should reflect them as is.

Yes, according IEEE 754 negative NaN is canonical and fully valid (chould be preserve sign and propagate with sign)

MaxGraey commented 3 years ago

Exposing the bit patterns of NaN is a massive mistake in Typed Arrays, and one we should not extend anywhere else.

@ljharb In my opinion the big mistake is try to fix IEEE 754 on software (language or VM) level. Even WebAssembly which try to be most deterministic ISA/VM don't try to do this

ljharb commented 3 years ago

All of JavaScript does this already, outside of typed arrays. It’s part of the language design.

ghost commented 3 years ago

Would it be a web compatibility-breaking change to add to the TypedArray's spec that implementations must canonicalize NaN values from the Float{32,64}Array numerical accessors and DataView.getFloat{32,64}?

Presently, it sounds like the language is quite frankly... broken. Yes, it's a small thing, but it still breaks a fundamental part of the ES language spec, and explicitly putting a a step into the algorithms for reading memory into JS floats would fill this hole, and clear up this issue, as JS implementations would no-longer expose NaN bit patterns.

ljharb commented 3 years ago

It wouldn't likely break the web, but the committee explicitly decided in 2015 to not mandate NaN canonicalization in Typed Arrays, for performance reasons, and I'm quite confident there's no appetite to revisit that decision.

MaxGraey commented 3 years ago

It wouldn't likely break the web, but the committee explicitly decided in 2015 to not mandate NaN canonicalization in Typed Arrays, for performance reasons

And this totally make sense. How about relax NaN canonization to other lang parts? I don't think it may break the web

ljharb commented 3 years ago

@MaxGraey other language parts aren't used in hot paths or perf-sensitive code like Typed Arrays are (that's their reason for existing). I would be strongly opposed to any attempt to further worsen the situation around NaN canonicalization in the language.

MaxGraey commented 3 years ago

attempt to further worsen the situation around NaN canonicalization in the language.

Why? In user space bit signature of NaN doesn't matter at all. It may still canonize for FFI or something like this if it's necessary. Relax this requirement will simplify and speedup js engines

ghost commented 3 years ago

Off-topic, but does ECMAScript's canonical NaN value have a canonical bitset?

in JS, explicitly and intentionally, there is supposed to only be one observable NaN value.

And is there any documented reasoning behind that decision? If so, could it be linked, so that we may at least understand this situation (a bit) better?

ljharb commented 3 years ago

@CrimsonCodes0 no, since the only bits of it are exposed via Typed Arrays.

The spec itself: https://tc39.es/ecma262/#sec-ecmascript-language-types-number-type.

In some implementations, external code might be able to detect a difference between various Not-a-Number values, but such behaviour is implementation-defined; to ECMAScript code, all NaN values are indistinguishable from each other.

ghost commented 3 years ago

I can't open the spec's multi-megabyte webpage without causing my entire device to lag, or crashing my (mobile) browser, is there a way to open only a small section of the spec?


Besides that, I have one last question to help me assess this problem: does the ES spec say that the floating point number (5.0) has a bitset? Does it acknowledge that it has one or otherwise say that it does?

If it acknowledges that any numbers have bit-patterns, it should acknowledge that all numbers do, including not-a-number, otherwise the specification makes no sense whatsoever, and ought to be changed.

If it does not acknowledge that any numbers have bit-patterns, then TypedArrays and DataViews are just plain broken features in JavaScript, since they clearly expose these "non-existent" bit-patterns to user scripts.

ljharb commented 3 years ago

Here's the same section on the multipage build: https://tc39.es/ecma262/multipage/ecmascript-data-types-and-values.html#sec-ecmascript-language-types-number-type

There's a note in there about the bit pattern; not sure if that answers your question.

That the language here is incongruous between "typed arrays" and "everything else" is true, but doesn't mean anything can change it. It also doesn't mean the incongruity should be worsened.

hax commented 3 years ago

I would be strongly opposed to any attempt to further worsen the situation around NaN canonicalization in the language.

But I think the semantic of signbit() should expose the sign bit as is. This is what signbit in any other languages do.

It also keep the simple invariant of signbit(x) === !signbit(-x).

ljharb commented 3 years ago

I don't think that invariant is possible; -(-NaN) is not guaranteed to have the same bit pattern as the original NaN. Engines are already allowed to canonicalize NaN in Typed Arrays - many just don't choose to.

There are no guarantees once you have a NaN. even storing it in a variable can change the bit pattern.

ghost commented 3 years ago

The first step of the unary negation algorithm canonicalizes the NaN, therefore this is merely a double canonicalization, thus the NaN should be the exact same NaN and consequently have the same bit-pattern, so I don't follow?

If the above is correct, then current engines aren't implementing it at all.

ljharb commented 3 years ago

Feel free to experiment with it in various engines - when writing https://npmjs.com/get-nans, i found a lot of unpredictable and unintuitive behavior.

hax commented 3 years ago

-(-NaN) is not guaranteed to have the same bit pattern

As my previous test https://github.com/tc39/proposal-Math.signbit/issues/1#issuecomment-549890886 , most engines keep the bit pattern.