Closed dherman closed 6 years ago
Is it safe to provide access to f64
/f32
? You can modify underling buffer via u64
such that it contains a signaling NaN bit pattern, whose safety properties are murky.
See how from_bits
method deals the issue: https://doc.rust-lang.org/std/primitive.f64.html#method.from_bits (this workaround is not applicable here).
See https://github.com/rust-lang/rust/pull/39271#issuecomment-286419431 for discussion.
I wonder how JavaScript itself deals with sNaNs here?
@matklad Right you are, and this is exactly the kind of worries I had about aliasing -- thank you so much for helping to zero in on the issues!
So one thing I think we could do here would be to create a custom Rust type that overloads Index
and IndexMut
so you can treat it like a slice, but with a well-specified and safe conversion semantics. The semantics of from_bits
and to_bits
seems underspecified, so maybe we couldn't even use that and would have to basically implement the serialization and deserialization ourselves?
It's not immediately obvious to be why we can't do the same kind of trick Rust's from_bits
does, and demote signaling NaNs to non-signaling NaNs. Adding a new trait for indexing an array with defined semantics seems like it'll need to do something very much like from_bits
is doing.
Unless there's something about signaling NaNs we like and want to keep for some reason, though I've never dealt with that before.
why we can't do the same kind of trick Rust's
from_bits
does
Right, we can--that's what I'm suggesting about using a custom type that overloads indexing (instead of the native &mut [f64]
type), so that we can define indexing to do NaN canonicalization with from_bits
.
Unless there's something about signaling NaNs we like and want to keep for some reason, though I've never dealt with that before.
We definitely don't want them--signalling NaNs can raise a SIGFPE signal and abort the process.
Ha, and of course we already aren’t using native slice types, we’re using the CSlice
and CMutSlice
types! So I think what we can do is have them take an optional type parameter that defines read/write coercions, and have it default to a noop; then the exposed type for a float slice would be CMutSlice<f64, CanonicalizeNaNs>
.
(I’ll revise the RFC with these ideas and see what people think.)
We definitely don't want them--signalling NaNs can raise a SIGFPE signal and abort the process.
My understanding is that the problem with sNaN is a bit more nuanced. On x86/ARM in practice sNaNs don't cause SIGFPE (the relevant bit in the control register is off), so CPU is not the problem. But in LLVM sNaNs are not produced by any instruction except for transmute, so LLVM assumes that having an f64 value with sNaN bit pattern is UB. When LLVM does optimizations, it assumes that any code that observes sNaN is dead, etc, etc.
I'm suggesting about using a custom type that overloads indexing (instead of the native &mut [f64] type), so that we can define indexing to do NaN canonicalization with from_bits.
It would be great to see a prototype here: I am afraid that Rust Index/IndexMut are not flexible enough for this, because you have to return &'j T
, you can't do MyRef<'j, T>
.
CPU is not the problem. ... When LLVM does optimizations, it assumes that any code that observes sNaN is dead, etc, etc.
Ah, I see. Thank you for the explanation! So from_bits
allows us to ensure we don't get an sNaN, which means we never need to write code to deal with one, which avoids LLVM doing unsafe optimizations based on UB assumptions. Did I understand that right?
I am afraid that Rust Index/IndexMut are not flexible enough for this
Yep, I just realized this. Argh! Back to brainstorming...
So from_bits allows us to ensure we don't get an sNaN, which means we never need to write code to deal with one, which avoids LLVM doing unsafe optimizations based on UB assumptions. Did I understand that right?
Hm, I now think that I am confusing this with another IEEE754 issue: https://github.com/rust-lang/rust/issues/10184. :)Better to read the thread on https://github.com/rust-lang/rust/pull/39271 I guess :) The consensus seems that LLVM and not the CPU is the problem, and that we actually just don't know what LLVM guarantees are.
Yep, I just realized this. Argh! Back to brainstorming...
Just what JavaScript itself does with this problem? Is it possible to get an sNaN into JavaScripted variable because of this?
Just what JavaScript itself does with this problem? Is it possible to get an sNaN into JavaScripted variable because of this?
Since typed arrays give you the ability to write any bit patterns you like, you can generate the bit pattern of an sNaN in the buffer. But the only way for JavaScript to interpret those bits as a floating-point number is to use a Float32Array
or Float64Array
view over the data and read from that address in the buffer. And the JS semantics of using a floating-point view is that implementations canonicalize NaN values on both reading from and writing to a binary buffer.
But the spec says nothing specifically about sNaNs; it only says the implementation can choose any NaN bit pattern it wants. The spec says that a JS engine has to always produce the exact same NaN bit pattern whenever writing a NaN to a typed array. I'm not sure if JS engines actually obey that in the real world--there's been a lot of debate about what the spec should claim here, and the spec may not reflect reality (in which case the spec is probably wrong, not reality--JS engine vendors kind of get to call the shots on this issue). But if that rule is in fact legitimate, then it demonstrates that there's no such thing as an sNaN value in JS, since it basically means you can never observe differences between NaN values.
That's at the level of semantics, though. At the level of implementation, I don't know if any JS engines would allow an sNaN bit pattern to get passed around as a JS value. I would guess not, but I don't know for sure.
So maybe the best we can do here for floating point types is just to use an API that requires explicit methods for indexing into the slice, instead of being able to use []
syntax:
let f = slice.get(i);
slice.set(j, f);
I'm realizing there's a similar potential issue with integers, too, since it's at least theoretically possible that at some point we'll need to always canonicalize them to be little-endian, even on big-endian targets.
Years ago, I advocated for the spec mandating this but didn't succeed. Still, if the web actually comes to rely on little-endian, which I feel pretty confident it will, any big-endian architectures that ever come along and want to implement JavaScript engines are going to have to normalize integers in typed arrays to little-endian in order to be compatible with the ecosystem of JS content in the world.
It's also possible that big-endian is just dead and we don't have to worry about it.
The ergonomic impact of .get()
/.set()
is probably awful isn't it. :(
The ergonomic impact of .get()/.set() is probably awful isn't it.
I think the biggest problem is not the ergonomics, but the fact that we must expose &[f64]
representation for it to be useful, because pretty much any nifty super-fast crate would need a slice to work.
@BurntSushi, did you have a chance to dig into the question of LLVM handling of signaling NaNs? I think you were going to: https://github.com/rust-lang/rust/pull/39271#issuecomment-294577947 :)
Good point. There’s no way to prevent a typed array from having sNaN bit patterns in it, so for this to have fully defined behavior either Rust has to be able to support sNaN or we can’t expose native slices. Definitely interested in the results of @burntsushi’s research! :)
either Rust has to be able to support sNaN or we can’t expose native slices.
We can, before giving access to [f64]
, change all sNaNs into qNaNs (sort of applying from_bits
trick for the whole array), but it changes the cost from O(1)
to O(N)
.
Hm, this comment hints that maybe there are no problems after all here? https://github.com/servo/webrender/issues/2027#issuecomment-343640747
change all sNaNs into qNaNs
Even if that weren’t expensive it wouldn’t be an option, unfortunately — typed arrays are used to hold heterogeneous information, so even if you’re looking at a section of it as float data other parts might be integers. So changing their bits would break programs.
But I think I’m starting to convince myself that we may be able to use actual slices for all types after all, with no canonicalization. I’m basing this on a few arguments:
If we can get away with this, the benefit is getting to interop with all the Rust ecosystem that uses slices, as you say, and maybe also compiler optimizations around slices (I’m not sure).
OK, yeah, there's no reason why we needed cslice in the first place. We use its ABI in two (morally identical) functions, Neon_Buffer_Data
and Neon_ArrayBuffer_Data
:
https://github.com/neon-bindings/neon/blob/master/crates/neon-runtime/src/neon.cc#L202-L206
and instead of a single out-pointer for a pair, we can simply pass two out-pointers for the two fields, as stack-local variables:
https://github.com/neon-bindings/neon/blob/master/src/js/binary.rs#L42-L47
And then we can call std::slice::from_raw_parts
to construct the Rust slice.
@matklad I've updated this RFC according to our discussion. I feel good about the state of it, so I'll probably tag it with "final comment period" soon.
OK, one more question for anyone who may have an opinion: ArrayBufferData
is not the best name for the data structure that represents the backing data, because we should use the same type for the backing data of both JsArrayBuffer
and JsBuffer
. Names I can think of:
BinaryData
BufferData
BackingData
-- too genericBackingBuffer
-- confusing because we already have two types that "are buffers"BinaryBuffer
-- dittoByteBuffer
-- dittoByteStore
-- confusing with "app store"BinaryStore
-- dittoI think the first two are my favorites. The latter is nice but it could confuse people into thinking it's only for JsBuffer
. BinaryData
has slightly less of a hint that it's related to buffers, but being the associated type for borrowing the contents of JsArrayBuffer
and JsBuffer
provides the association.
So I'm leaning towards BinaryData
. Anyone want to agree or disagree?
Pushed the name change.
This RFC proposes modifying the
JsArrayBuffer
API to allow viewing the underlying buffer data with different binary formats, similar to how typed arrays work in JS. This gives safe Rust more expressive control to manipulate the data.Rendered