rust-lang / portable-simd

The testing ground for the future of portable SIMD in Rust
Apache License 2.0
904 stars 82 forks source link

Consider enabling other indices than usize #166

Open workingjubilee opened 3 years ago

workingjubilee commented 3 years ago

This would minimize contortions for casting between various index sizes.

See: https://github.com/rust-lang/rust/issues/89193#issuecomment-92627025

Since there is a bit of chatter here on the use cases for performing a scatter or gather for indexes, I can give a little bit of context, since I do think it is somewhat informative.

In this case I am working on a columnar store. You can kind of think of the use case as a secondary index over columnar data. For just a single primary index, you have a list of row_ids Simd<usize, LANES> and you can perform a simd gather instruction to get all of the values at those indexes. However, if you have a secondary index the simd gather from that secondary index needs to produce a Simd<usize, LANES>, so that can be passed directly into the second of the gather functions over the primary keys. In this way you go from needed 2*LANES loads to perform a secondary index lookup over data to only needing 2 simd operations.

Alternatively it would be nice if the Simd libraries were updated to perform gathers using indexes as Simd<u32, LANES> or Simd<u64, LANES>. That would dodge this issue altogether and it would mean less copies in our columnar store. Many times columnar stores you keep offsets as u32 (or even smaller) to keep things compressed. And it is nice to be able to act directly on those indices rather than performing intermediate conversions to usize.

calebzulawski commented 3 years ago

Since in LLVM it's just using pointers, anyway, we can probably allow it to take any unsigned integer and the optimizer will handle it from there. I don't believe this needs any rustc support.

dyv commented 3 years ago

This would be awesome. I just ran into another case where I was operating on dictionary encoded values (flexibly encoded as u8 or u16), and then needed to convert into usize to perform lookups into the dictionary.

workingjubilee commented 3 years ago

Doing it for u8 and u16 seems trivial since those both have Into<usize>. I think my only realistic concern is a 32-bit arch that supports 128-bit vectors and people using indices of u64.

calebzulawski commented 3 years ago

It would probably be reasonable to provide a wrapping int conversion function

workingjubilee commented 3 years ago

In practice, the underlying API uses u32. Hrm...

calebzulawski commented 3 years ago

I think it's best to hide that it's u32, since that's pretty arbitrary IMO. usize makes the most sense, I think

workingjubilee commented 3 years ago

Wait, I am getting swizzle and gather confused here. llvm's shufflevector uses u32, and llvm.masked.gather uses pointers. Gah!

pro465 commented 3 years ago

Doing it for u8 and u16 seems trivial since those both have Into<usize>. I think my only realistic concern is a 32-bit arch that supports 128-bit vectors and people using indices of u64.

i think maybe we should not allow those and only allow types which do implement Into<usize> 🤔

bjorn3 commented 3 years ago

Only u8 and u16 implement Into<usize>. u32 and u64 don't as they would need to truncate on platforms with a small pointer size. Rust supports 16bit platforms, but not 8bit platforms so the minimal usize size is 16bit.

pro465 commented 3 years ago

@bjorn3 i dont think we should care about u32 and u64. after all, if std goes as far as implementing indexing only for usize for slices, we are already too much "forgiving" in implementing indexing for T: Into<usize>, why should we be more "forgiving"?

workingjubilee commented 3 years ago

The choice of usize is actually arbitrary here. There is no immediately obvious reason to not use u32 as the upper bound and accept Into<u32>. The design is intended to allow us to make maximal usage of the actual machines, so that is the reason why: we get to choose between things like this in our design.

calebzulawski commented 3 years ago

I disagree that it's arbitrary--usize is the correct type for indexing arrays, which is effectively all gather is. I personally don't think we should overcomplicate the interface, and shouldn't use any tricks with Into. Once we have casts analogous to as (#116) it should be pretty painless to use any integer type.

scottmcm commented 1 year ago

usize is the type for indexing arrays (unfortunately; I wish we had a smaller index type), but u32 is the type for "indexing" bits. I think I could justifiably argue for either in SIMD -- thinking of lanes in simd ops like bits in bitwise ops also seems reasonable to me.

(Though really we'll probably keep things usize as the primary option because otherwise as_array and friends look like they'd be a huge mess.)