rust-lang / portable-simd

The testing ground for the future of portable SIMD in Rust
Apache License 2.0
891 stars 80 forks source link

Implement type casts (non-transmute & transmute) #116

Open programmerjake opened 3 years ago

programmerjake commented 3 years ago

I wasn't able to find type casts, we need them:

fn f(v: f32x16) -> i8x16 {
    v.cast() // doesn't currently work
}

This is different than transmuting (which is something we also need; we're planning on mostly relying on safe-transmute for this).

Lokathor commented 3 years ago

We currently have to and from arrays, and the rest you can do from there in a pinch.

It didn't seem that there was much hardware support for casts within Intel and ARM so we pushed it to the back of the schedule.

calebzulawski commented 3 years ago

The only casts that seem broadly supported is between floats and integers of the same width, which we do have implemented

workingjubilee commented 3 years ago

Arrays are more ergonomic if used with the array map method.

gilescope commented 3 years ago

My case is going between i8x16 <--> i16x8 and the like. I have an impressive amount of transmutes... there's probably a nicer way?

Lokathor commented 3 years ago

So you want two i8 at a time to combine into i16?

or do you want to take in one integer register and produce two registers of data by sign extending each individual i8?

gilescope commented 3 years ago

The former. I'm parsing ints (quickly) and trying to see if I can use core_simd for the most part to do this:

https://github.com/pickfire/parseint/blob/b75bf5e47a3b2fdc1934bab046e677e033173e18/src/lib.rs#L151

The other thing I might be missing is a nice way to switch to and from __m128i for unsupported intrinsics.

Lokathor commented 3 years ago

I can't look too closely right now, but that transmute_copy is suspicious.

but yeah, you'll probably have some transmutes in there for a while until we fill in the edges of the API.

For a fix that works immediately you could convert this to the safe_arch crate, which has bytemuck support, then you're at least doing pretty much all safe code.

gilescope commented 3 years ago

Ah I see I can do i8x16::from(__m128i). - Excellent that removes a fair few transmutes.

gilescope commented 3 years ago

Thanks. Suspicious yes, but I have ensured that we never look at memory that isn't ours. I.e. the slice is always at least long enough when transmuted. I had a look at bytemuck but it seemed I would need to turn it into a fixed size array before using it as a Pod.

On Sat, May 8, 2021 at 7:57 PM Lokathor @.***> wrote:

I can't look too closely right now, but that transmute_copy is suspicious.

but yeah, you'll probably have some transmutes in there for a while until we fill in the edges of the API.

For a fix that works immediately you could convert this to the safe_arch crate, which has bytemuck support, then you're at least doing pretty much all safe code.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rust-lang/stdsimd/issues/116#issuecomment-835472192, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGEJCA45FDGM7UEXW6GHH3TMWCTHANCNFSM44EHGCYQ .

hsivonen commented 3 years ago
the trait `From<SimdU8<16_usize>>` is not implemented for `SimdU16<8_usize>`

Is there a principled reason for this considering that both u8x16 and u16x8 convert to/from __m128i using from()/into()? Or just not done yet? (I want to do a zero-cost endianness-dependent re-interpretation of the register.)

calebzulawski commented 3 years ago

We haven't really prioritized conversions yet, but I think our general stance has been "this is a job for transmute, and safe transmute will make that easier". Converting between the std::arch types is a little different since __m128i doesn't encode integer type information, but maybe that should just be a transmute as well?

Lokathor commented 3 years ago

Yeah if you deliberately want to change lane count like this while preserving register byte size, just go to the arch type and then from the arch type in your new lane count type using stdsimd. It's a hair verbose, but works totally fine.

hsivonen commented 3 years ago

We haven't really prioritized conversions yet, but I think our general stance has been "this is a job for transmute, and safe transmute will make that easier".

I hope the timeline for that feature works out well for SIMD purposes.

Yeah if you deliberately want to change lane count like this while preserving register byte size, just go to the arch type and then from the arch type in your new lane count type using stdsimd. It's a hair verbose, but works totally fine.

Going via the arch type isn't portable and adds an unnecessary intermediate step.

thomcc commented 3 years ago

I hope the timeline for that feature works out well for SIMD purposes.

Yeah, it doesn't look like it, honestly. That project seems like it has all but stalled.

Even if not, I also think it might be worth including explicit ways to perform conversions, given how common they are in SIMD code.

thomcc commented 3 years ago

There's some prior art here in fN::{from,to}_bits, uN::{to/from}_ne_bytes, etc.

Lokathor commented 3 years ago

I think that those are just "transmute that we're willing to bless because it's so common". So, safe transmute makes them in some sense pointless. However, I do sympathize that safe transmute will take a while to come around.

I think "just use transmute for now" kinda has to be the answer here though, unless we want to implement our own mini-safe-transmute API within the portable simd API. We could maybe do that, but I don't think we should stabilize that mini api in the long run, and I'm hesitant to suggest thay anyone write down a bunch of code (probably in the form of weird macro stuff) that we know we'll never keep long term.

BurntSushi commented 3 years ago

unless we want to implement our own mini-safe-transmute API

Is there something more to this than defining the obvious From impls and implementing them with transmute?

I think for me, the thing that would push something like this over the edge is if this were a common source of unsafe in otherwise safe portable SIMD code. That would be pretty unfortunate and a compelling reason IMO for exposing safe bitcast APIs (just like std does for integers<->bytes). If you are otherwise writing unsafe everywhere, then a transmute or a pointer cast seems like less of a big deal.

I suspect the case is the former, although I'm not certain. If so, then yes, I think these APIs should absolutely be added unless there's some complexity here that I don't understand.

Lokathor commented 3 years ago

Well, the approximate breakdown is:

So if the request, as given in the example code in the first post, is a general "cast" operation that "just works" then it feels like something that is unfortunately just slightly awkwardly out of scope. I admit that it's useful, but I don't think it best fits in this sub-project of rust in the long term.

calebzulawski commented 3 years ago

Is there something more to this than defining the obvious From impls and implementing them with transmute?

The implementation is trivial, the reason we've been avoiding it is because it's not clear what semantics From should carry. Should i16 implement From<[u8; 2]>? Probably not. I think transmute carries the meaning we want, the downside is the unsafe.

I do think we can implement {to,from}_bits for all vectors, which would allow a safe "transmute" with the slight inconvenience of two function calls instead of one.

BurntSushi commented 3 years ago

I do think we can implement {to,from}_bits for all vectors, which would allow a safe "transmute" with the slight inconvenience of two function calls instead of one.

That might be a nice compromise assuming the codegen is the same. (And I assume it would be.) @hsivonen What do you think?

hsivonen commented 3 years ago

Indeed, lane reinterpretation is currently a source of unsafe in otherwise-safe code. to_bits/from_bits would work for me.

(Masks should zero-cost convert to integer lanes but not vice versa.)

workingjubilee commented 3 years ago

I took a quick crack at this for five minutes and am just commenting here so that it's staring me down when I return to it: in order to make this sound, this requires either reworking the way we currently apply our "can actually be implemented" lane limit, or else accepting the extremely awkward return signature of

pub fn to_ne_bytes(self) -> [[u8; N]; LANES];
pub fn from_ne_bytes([[u8; N]; LANES]) -> Self;

Frankly, I don't think that signature is acceptable for all the uses everyone has been wanting this for, but I do have ideas for how to grind the lane limit under my heel and make this right.

programmerjake commented 3 years ago

(Masks should zero-cost convert to integer lanes but not vice versa.)

That only applies to full-width masks, bitmasks will zero-cost convert to a different type (u64 or similar?).

hsivonen commented 3 years ago

(Masks should zero-cost convert to integer lanes but not vice versa.)

That only applies to full-width masks, bitmasks will zero-cost convert to a different type (u64 or similar?).

I agree. (I meant SIMD register mask types in my previous remark.)

calebzulawski commented 3 years ago

@workingjubilee I tried my hand at this and I think the way to go is:

trait ToBytes {
    type Bytes;
    #[doc(hidden)]
    fn to_bytes_impl(self) -> Self::Bytes { 
        unsafe { core::mem::transmute(self) }
    }
    #[doc(hidden)]
    fn from_bytes_impl(bytes: Self::Bytes) -> Self {
        unsafe { core::mem::transmute(bytes) }
    }
}

...
    pub fn to_ne_bytes(self) -> Self::Bytes { self.to_bytes_impl() }
    pub fn from_ne_bytes(bytes: Self::Bytes) -> Self { Self::from_bytes_impl(bytes) }
...

If we didn't have LanesAtMost32 and used const_evaluatable_checked this would work fine, but due to the lane limits, implementing this on SimdI64<8> isn't possible (since it needs SimdU8<64> bytes).

calebzulawski commented 3 years ago

Based on the limitations of the implementation in the linked PR, using transmute directly is usually the best solution for now, but if you absolutely must have no unsafe, you can use byte conversions as long as the type is not more than 32 bytes long. Hopefully in the future that limitation can be lifted, but right now that's blocked on #90.

programmerjake commented 3 years ago

changed the issue to also include transmutes (which weren't originally intended to be covered by this issue, but whatever...)

programmerjake commented 3 years ago

rustc's simd_cast intrinsic needs to gain support for usize/isize: https://github.com/rust-lang/rust/blob/17e13b549f5f83cd9ffca9a540090754eb95115c/compiler/rustc_codegen_llvm/src/intrinsic.rs#L1675-L1676

came up here: https://rust-lang.zulipchat.com/#narrow/stream/257879-project-portable-simd/topic/Coersing.20LLVM.20to.20emit.20mulhi/near/259060275

aldanor commented 2 years ago

By the way - wonder what's the suggested way to convert between lane types of different size but with the same number of lanes? (e.g. u8x16 to u16x16)