Open programmerjake opened 3 years ago
We currently have to and from arrays, and the rest you can do from there in a pinch.
It didn't seem that there was much hardware support for casts within Intel and ARM so we pushed it to the back of the schedule.
The only casts that seem broadly supported is between floats and integers of the same width, which we do have implemented
Arrays are more ergonomic if used with the array map method.
My case is going between i8x16
<--> i16x8
and the like. I have an impressive amount of transmutes... there's probably a nicer way?
So you want two i8 at a time to combine into i16?
or do you want to take in one integer register and produce two registers of data by sign extending each individual i8?
The former. I'm parsing ints (quickly) and trying to see if I can use core_simd for the most part to do this:
https://github.com/pickfire/parseint/blob/b75bf5e47a3b2fdc1934bab046e677e033173e18/src/lib.rs#L151
The other thing I might be missing is a nice way to switch to and from __m128i
for unsupported intrinsics.
I can't look too closely right now, but that transmute_copy is suspicious.
but yeah, you'll probably have some transmutes in there for a while until we fill in the edges of the API.
For a fix that works immediately you could convert this to the safe_arch crate, which has bytemuck support, then you're at least doing pretty much all safe code.
Ah I see I can do i8x16::from(__m128i)
. - Excellent that removes a fair few transmutes.
Thanks. Suspicious yes, but I have ensured that we never look at memory that isn't ours. I.e. the slice is always at least long enough when transmuted. I had a look at bytemuck but it seemed I would need to turn it into a fixed size array before using it as a Pod.
On Sat, May 8, 2021 at 7:57 PM Lokathor @.***> wrote:
I can't look too closely right now, but that transmute_copy is suspicious.
but yeah, you'll probably have some transmutes in there for a while until we fill in the edges of the API.
For a fix that works immediately you could convert this to the safe_arch crate, which has bytemuck support, then you're at least doing pretty much all safe code.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rust-lang/stdsimd/issues/116#issuecomment-835472192, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGEJCA45FDGM7UEXW6GHH3TMWCTHANCNFSM44EHGCYQ .
the trait `From<SimdU8<16_usize>>` is not implemented for `SimdU16<8_usize>`
Is there a principled reason for this considering that both u8x16
and u16x8
convert to/from __m128i
using from()
/into()
? Or just not done yet? (I want to do a zero-cost endianness-dependent re-interpretation of the register.)
We haven't really prioritized conversions yet, but I think our general stance has been "this is a job for transmute, and safe transmute will make that easier". Converting between the std::arch types is a little different since __m128i
doesn't encode integer type information, but maybe that should just be a transmute as well?
Yeah if you deliberately want to change lane count like this while preserving register byte size, just go to the arch type and then from the arch type in your new lane count type using stdsimd. It's a hair verbose, but works totally fine.
We haven't really prioritized conversions yet, but I think our general stance has been "this is a job for transmute, and safe transmute will make that easier".
I hope the timeline for that feature works out well for SIMD purposes.
Yeah if you deliberately want to change lane count like this while preserving register byte size, just go to the arch type and then from the arch type in your new lane count type using stdsimd. It's a hair verbose, but works totally fine.
Going via the arch type isn't portable and adds an unnecessary intermediate step.
I hope the timeline for that feature works out well for SIMD purposes.
Yeah, it doesn't look like it, honestly. That project seems like it has all but stalled.
Even if not, I also think it might be worth including explicit ways to perform conversions, given how common they are in SIMD code.
There's some prior art here in fN::{from,to}_bits
, uN::{to/from}_ne_bytes
, etc.
I think that those are just "transmute that we're willing to bless because it's so common". So, safe transmute makes them in some sense pointless. However, I do sympathize that safe transmute will take a while to come around.
I think "just use transmute for now" kinda has to be the answer here though, unless we want to implement our own mini-safe-transmute API within the portable simd API. We could maybe do that, but I don't think we should stabilize that mini api in the long run, and I'm hesitant to suggest thay anyone write down a bunch of code (probably in the form of weird macro stuff) that we know we'll never keep long term.
unless we want to implement our own mini-safe-transmute API
Is there something more to this than defining the obvious From
impls and implementing them with transmute?
I think for me, the thing that would push something like this over the edge is if this were a common source of unsafe
in otherwise safe portable SIMD code. That would be pretty unfortunate and a compelling reason IMO for exposing safe bitcast APIs (just like std
does for integers<->bytes). If you are otherwise writing unsafe
everywhere, then a transmute or a pointer cast seems like less of a big deal.
I suspect the case is the former, although I'm not certain. If so, then yes, I think these APIs should absolutely be added unless there's some complexity here that I don't understand.
Well, the approximate breakdown is:
So if the request, as given in the example code in the first post, is a general "cast" operation that "just works" then it feels like something that is unfortunately just slightly awkwardly out of scope. I admit that it's useful, but I don't think it best fits in this sub-project of rust in the long term.
Is there something more to this than defining the obvious
From
impls and implementing them with transmute?
The implementation is trivial, the reason we've been avoiding it is because it's not clear what semantics From
should carry. Should i16
implement From<[u8; 2]>
? Probably not. I think transmute carries the meaning we want, the downside is the unsafe
.
I do think we can implement {to,from}_bits
for all vectors, which would allow a safe "transmute" with the slight inconvenience of two function calls instead of one.
I do think we can implement
{to,from}_bits
for all vectors, which would allow a safe "transmute" with the slight inconvenience of two function calls instead of one.
That might be a nice compromise assuming the codegen is the same. (And I assume it would be.) @hsivonen What do you think?
Indeed, lane reinterpretation is currently a source of unsafe
in otherwise-safe code. to_bits
/from_bits
would work for me.
(Masks should zero-cost convert to integer lanes but not vice versa.)
I took a quick crack at this for five minutes and am just commenting here so that it's staring me down when I return to it: in order to make this sound, this requires either reworking the way we currently apply our "can actually be implemented" lane limit, or else accepting the extremely awkward return signature of
pub fn to_ne_bytes(self) -> [[u8; N]; LANES];
pub fn from_ne_bytes([[u8; N]; LANES]) -> Self;
Frankly, I don't think that signature is acceptable for all the uses everyone has been wanting this for, but I do have ideas for how to grind the lane limit under my heel and make this right.
(Masks should zero-cost convert to integer lanes but not vice versa.)
That only applies to full-width masks, bitmasks will zero-cost convert to a different type (u64
or similar?).
(Masks should zero-cost convert to integer lanes but not vice versa.)
That only applies to full-width masks, bitmasks will zero-cost convert to a different type (
u64
or similar?).
I agree. (I meant SIMD register mask types in my previous remark.)
@workingjubilee I tried my hand at this and I think the way to go is:
trait ToBytes {
type Bytes;
#[doc(hidden)]
fn to_bytes_impl(self) -> Self::Bytes {
unsafe { core::mem::transmute(self) }
}
#[doc(hidden)]
fn from_bytes_impl(bytes: Self::Bytes) -> Self {
unsafe { core::mem::transmute(bytes) }
}
}
...
pub fn to_ne_bytes(self) -> Self::Bytes { self.to_bytes_impl() }
pub fn from_ne_bytes(bytes: Self::Bytes) -> Self { Self::from_bytes_impl(bytes) }
...
If we didn't have LanesAtMost32
and used const_evaluatable_checked
this would work fine, but due to the lane limits, implementing this on SimdI64<8>
isn't possible (since it needs SimdU8<64>
bytes).
Based on the limitations of the implementation in the linked PR, using transmute
directly is usually the best solution for now, but if you absolutely must have no unsafe
, you can use byte conversions as long as the type is not more than 32 bytes long. Hopefully in the future that limitation can be lifted, but right now that's blocked on #90.
changed the issue to also include transmutes (which weren't originally intended to be covered by this issue, but whatever...)
rustc's simd_cast
intrinsic needs to gain support for usize
/isize
:
https://github.com/rust-lang/rust/blob/17e13b549f5f83cd9ffca9a540090754eb95115c/compiler/rustc_codegen_llvm/src/intrinsic.rs#L1675-L1676
By the way - wonder what's the suggested way to convert between lane types of different size but with the same number of lanes? (e.g. u8x16
to u16x16
)
I wasn't able to find type casts, we need them:
This is different than transmuting (which is something we also need; we're planning on mostly relying on safe-transmute for this).