RFC: ArrayBuffer Views - Githubissues

dherman commented 7 years ago

This RFC proposes modifying the JsArrayBuffer API to allow viewing the underlying buffer data with different binary formats, similar to how typed arrays work in JS. This gives safe Rust more expressive control to manipulate the data.

Rendered

matklad commented 7 years ago

Is it safe to provide access to f64/f32? You can modify underling buffer via u64 such that it contains a signaling NaN bit pattern, whose safety properties are murky.

See how from_bits method deals the issue: https://doc.rust-lang.org/std/primitive.f64.html#method.from_bits (this workaround is not applicable here).

See https://github.com/rust-lang/rust/pull/39271#issuecomment-286419431 for discussion.

I wonder how JavaScript itself deals with sNaNs here?

dherman commented 6 years ago

@matklad Right you are, and this is exactly the kind of worries I had about aliasing -- thank you so much for helping to zero in on the issues!

So one thing I think we could do here would be to create a custom Rust type that overloads Index and IndexMut so you can treat it like a slice, but with a well-specified and safe conversion semantics. The semantics of from_bits and to_bits seems underspecified, so maybe we couldn't even use that and would have to basically implement the serialization and deserialization ourselves?

nixpulvis commented 6 years ago

It's not immediately obvious to be why we can't do the same kind of trick Rust's from_bits does, and demote signaling NaNs to non-signaling NaNs. Adding a new trait for indexing an array with defined semantics seems like it'll need to do something very much like from_bits is doing.

Unless there's something about signaling NaNs we like and want to keep for some reason, though I've never dealt with that before.

dherman commented 6 years ago

why we can't do the same kind of trick Rust's from_bits does

Right, we can--that's what I'm suggesting about using a custom type that overloads indexing (instead of the native &mut [f64] type), so that we can define indexing to do NaN canonicalization with from_bits.

Unless there's something about signaling NaNs we like and want to keep for some reason, though I've never dealt with that before.

We definitely don't want them--signalling NaNs can raise a SIGFPE signal and abort the process.

dherman commented 6 years ago

Ha, and of course we already aren’t using native slice types, we’re using the CSlice and CMutSlice types! So I think what we can do is have them take an optional type parameter that defines read/write coercions, and have it default to a noop; then the exposed type for a float slice would be CMutSlice<f64, CanonicalizeNaNs>.

dherman commented 6 years ago

(I’ll revise the RFC with these ideas and see what people think.)

matklad commented 6 years ago

We definitely don't want them--signalling NaNs can raise a SIGFPE signal and abort the process.

My understanding is that the problem with sNaN is a bit more nuanced. On x86/ARM in practice sNaNs don't cause SIGFPE (the relevant bit in the control register is off), so CPU is not the problem. But in LLVM sNaNs are not produced by any instruction except for transmute, so LLVM assumes that having an f64 value with sNaN bit pattern is UB. When LLVM does optimizations, it assumes that any code that observes sNaN is dead, etc, etc.

I'm suggesting about using a custom type that overloads indexing (instead of the native &mut [f64] type), so that we can define indexing to do NaN canonicalization with from_bits.

It would be great to see a prototype here: I am afraid that Rust Index/IndexMut are not flexible enough for this, because you have to return &'j T, you can't do MyRef<'j, T>.

dherman commented 6 years ago

CPU is not the problem. ... When LLVM does optimizations, it assumes that any code that observes sNaN is dead, etc, etc.

Ah, I see. Thank you for the explanation! So from_bits allows us to ensure we don't get an sNaN, which means we never need to write code to deal with one, which avoids LLVM doing unsafe optimizations based on UB assumptions. Did I understand that right?

I am afraid that Rust Index/IndexMut are not flexible enough for this

Yep, I just realized this. Argh! Back to brainstorming...

matklad commented 6 years ago

So from_bits allows us to ensure we don't get an sNaN, which means we never need to write code to deal with one, which avoids LLVM doing unsafe optimizations based on UB assumptions. Did I understand that right?

Hm, I now think that I am confusing this with another IEEE754 issue: https://github.com/rust-lang/rust/issues/10184. :)Better to read the thread on https://github.com/rust-lang/rust/pull/39271 I guess :) The consensus seems that LLVM and not the CPU is the problem, and that we actually just don't know what LLVM guarantees are.

Yep, I just realized this. Argh! Back to brainstorming...

Just what JavaScript itself does with this problem? Is it possible to get an sNaN into JavaScripted variable because of this?

dherman commented 6 years ago

Just what JavaScript itself does with this problem? Is it possible to get an sNaN into JavaScripted variable because of this?

Since typed arrays give you the ability to write any bit patterns you like, you can generate the bit pattern of an sNaN in the buffer. But the only way for JavaScript to interpret those bits as a floating-point number is to use a Float32Array or Float64Array view over the data and read from that address in the buffer. And the JS semantics of using a floating-point view is that implementations canonicalize NaN values on both reading from and writing to a binary buffer.

But the spec says nothing specifically about sNaNs; it only says the implementation can choose any NaN bit pattern it wants. The spec says that a JS engine has to always produce the exact same NaN bit pattern whenever writing a NaN to a typed array. I'm not sure if JS engines actually obey that in the real world--there's been a lot of debate about what the spec should claim here, and the spec may not reflect reality (in which case the spec is probably wrong, not reality--JS engine vendors kind of get to call the shots on this issue). But if that rule is in fact legitimate, then it demonstrates that there's no such thing as an sNaN value in JS, since it basically means you can never observe differences between NaN values.

That's at the level of semantics, though. At the level of implementation, I don't know if any JS engines would allow an sNaN bit pattern to get passed around as a JS value. I would guess not, but I don't know for sure.

dherman commented 6 years ago

So maybe the best we can do here for floating point types is just to use an API that requires explicit methods for indexing into the slice, instead of being able to use [] syntax:

let f = slice.get(i);
slice.set(j, f);

dherman commented 6 years ago

I'm realizing there's a similar potential issue with integers, too, since it's at least theoretically possible that at some point we'll need to always canonicalize them to be little-endian, even on big-endian targets.

Years ago, I advocated for the spec mandating this but didn't succeed. Still, if the web actually comes to rely on little-endian, which I feel pretty confident it will, any big-endian architectures that ever come along and want to implement JavaScript engines are going to have to normalize integers in typed arrays to little-endian in order to be compatible with the ecosystem of JS content in the world.

It's also possible that big-endian is just dead and we don't have to worry about it.

The ergonomic impact of .get()/.set() is probably awful isn't it. :(

matklad commented 6 years ago

The ergonomic impact of .get()/.set() is probably awful isn't it.

I think the biggest problem is not the ergonomics, but the fact that we must expose &[f64] representation for it to be useful, because pretty much any nifty super-fast crate would need a slice to work.

@BurntSushi, did you have a chance to dig into the question of LLVM handling of signaling NaNs? I think you were going to: https://github.com/rust-lang/rust/pull/39271#issuecomment-294577947 :)

dherman commented 6 years ago

Good point. There’s no way to prevent a typed array from having sNaN bit patterns in it, so for this to have fully defined behavior either Rust has to be able to support sNaN or we can’t expose native slices. Definitely interested in the results of @burntsushi’s research! :)

matklad commented 6 years ago

either Rust has to be able to support sNaN or we can’t expose native slices.

We can, before giving access to [f64], change all sNaNs into qNaNs (sort of applying from_bits trick for the whole array), but it changes the cost from O(1) to O(N).

matklad commented 6 years ago

Hm, this comment hints that maybe there are no problems after all here? https://github.com/servo/webrender/issues/2027#issuecomment-343640747

dherman commented 6 years ago

https://twitter.com/gankro/status/931535748628729856

dherman commented 6 years ago

change all sNaNs into qNaNs

Even if that weren’t expensive it wouldn’t be an option, unfortunately — typed arrays are used to hold heterogeneous information, so even if you’re looking at a section of it as float data other parts might be integers. So changing their bits would break programs.

But I think I’m starting to convince myself that we may be able to use actual slices for all types after all, with no canonicalization. I’m basing this on a few arguments:

Signalling NaN is safe so floats don’t need canonicalization to be safe in Rust.
There’s no reason Rust should have to be more portable than JS for endianness, and anyway in practice probably Neon will only be used on little-endian systems; anyone shipping JS typed array code on a big-endian system basically can never use code that relies on byte order because the ecosystem of JS code won’t be written correctly portably anyway. (For years I’ve asked around if people know of any modern JS engines with typed array support that ship on big-endian architectures and haven’t found any so far.)
The remaining piece is whether we can get away from the cslice library here, and I bet we can by passing the individual fields into C instead of the ABI-stable struct, but I need to study the code again. Alternatively maybe the ABI of Rust slices has stabilized since last I checked? If not maybe we can push for that to stabilize.

If we can get away with this, the benefit is getting to interop with all the Rust ecosystem that uses slices, as you say, and maybe also compiler optimizations around slices (I’m not sure).

dherman commented 6 years ago

OK, yeah, there's no reason why we needed cslice in the first place. We use its ABI in two (morally identical) functions, Neon_Buffer_Data and Neon_ArrayBuffer_Data:

https://github.com/neon-bindings/neon/blob/master/crates/neon-runtime/src/neon.cc#L202-L206

and instead of a single out-pointer for a pair, we can simply pass two out-pointers for the two fields, as stack-local variables:

https://github.com/neon-bindings/neon/blob/master/src/js/binary.rs#L42-L47

And then we can call std::slice::from_raw_parts to construct the Rust slice.

dherman commented 6 years ago

@matklad I've updated this RFC according to our discussion. I feel good about the state of it, so I'll probably tag it with "final comment period" soon.

dherman commented 6 years ago

OK, one more question for anyone who may have an opinion: ArrayBufferData is not the best name for the data structure that represents the backing data, because we should use the same type for the backing data of both JsArrayBuffer and JsBuffer. Names I can think of:

BinaryData
BufferData
BackingData -- too generic
BackingBuffer -- confusing because we already have two types that "are buffers"
BinaryBuffer -- ditto
ByteBuffer -- ditto
ByteStore -- confusing with "app store"
BinaryStore -- ditto

I think the first two are my favorites. The latter is nice but it could confuse people into thinking it's only for JsBuffer. BinaryData has slightly less of a hint that it's related to buffers, but being the associated type for borrowing the contents of JsArrayBuffer and JsBuffer provides the association.

So I'm leaning towards BinaryData. Anyone want to agree or disagree?

dherman commented 6 years ago

Pushed the name change.

neon-bindings / rfcs

RFC: ArrayBuffer Views #5