Unboxed is not a good default unboxed vector

Lysxia commented 5 years ago

Currently the documentation (or lack of it) suggests to use Unboxed if you want unboxed, low-overhead vectors. In fact the wiki linked in the description says:

End users should use Data.Vector.Unboxed for most cases

However, in my experience, it's really not the better default, as opposed to Storable (or perhaps Primitive). Here I propose to update the documentation to make Unboxed less prominent, and even explicitly discourage its use (to counter the natural appeal of its name).

The `Unbox` class is hard to understand

It involves

Multi-param type classes, and the class synonym trick (Unbox)
Data type families (which I think are even more niche than plain type families)
The names MVector and Vector are used both for classes and data type families, adding to the confusion
PrimMonad (should monads even be a prerequisite to use efficient vectors for user-defined types?)

Hence, most new users would probably be unable to complete the sketched implementation of Unbox for Complex in the documentation of Data.Vector.Unboxed.

Furthermore, because of the methods parameterized by PrimMonad instances, it's not possible to use DerivingVia to abstract any of those details. So you either pay 30 lines of boilerplate, or use CPP/Template Haskell. Even assuming users are capable of adapting that boilerplate to their own types, that doesn't look "very easy" (quoted from the documentation).

Unboxed is not the most efficient

In addition, although Unboxed is touted as efficient because vectors of tuples are merely tuples of vectors, a flat Storable vector is actually more efficient in almost all cases: it has much fewer indirections and having every entry in a contiguous piece of memory makes it more cache-friendly. The Storable class is also so much easier to understand and use than Unbox. Access patterns taking advantage of Unboxed's layout are arguably rare.

That means this sentence at the top of Data.Vector.Primitive (which is similar to Storable in those respects) is incorrect:

Adaptive unboxed vectors defined in Data.Vector.Unboxed are significantly more flexible at no performance cost.

`Storable` vs `Primitive`

Instead of Unboxed, either Storable or Primitive should be promoted instead.

I'm still uncertain about the trade-offs of who should manage a vector's memory.

Although Storable is originally an FFI thing, it still seems to do a fine job for general-purpose programming, and it doesn't have the 2x factor of GC-controlled memory, as opposed to Primitive.
MagicHash is a recurrent point of confusion, and that also plays against Prim from a documentation/usability point of view.

I'm sure there are advantages to Primitive, but I'm not sure they are sufficiently strong to make it a better default than Storable.

Shimuuar commented 5 years ago

My 2c

Working with unboxed data in haskell is somewhat awkward and require some sort of boilerplate no matter what. With Storable one has write boilerplate instance as well. And boilerplate is necessary with any approach. One has to describe how data is laid out in memory.

Also Unboxed comes with different tradeoffs compared to storable. Former implements structure of arrays (by convention), latter array of structures (necessary). Having side by side comparison of them would be nice

P.S. Storable does not have instances for tuples which caused some problems to me.

Lysxia commented 5 years ago

I disagree with the title change.

I don't think the cost-to-benefit ratio of reworking Unbox is as good as simply pointing people towards Storable more. My mind could be changed if someone proposed a concrete and compelling way to improve Unbox that would make it on par or easier to work with than Storable.

Storable boilerplate is significantly easier to manage than Unbox. It can be derived generically, for example using derive-storable. Although I don't really agree with this particular implementation, it is a concrete example of what's possible without extra-linguistic features such as CPP or Template Haskell.

cartazio commented 5 years ago

theres definitely room for improvements. And I do agree that the SOA vs AOS (struct of array vs array of structs) is conflated with unpinned vs pinned bytearrays.

currently we have SOA unpinned == unboxed and AOS C Style struct pinned arrays == storable.

it would be totally valid to have some flavors of AOS unpinned and SOA pinned.

Primitive is just a helper for boot strapping.

theres definitely

deriving via doesn't work for the same reason that Generalized New Type Deriving. The Pure and Mutable vector apis need the monad (resp primmonad) constraints on the generic versions for evaluation ordering (resp effects), and Role annotations currently. This was a HUGE pain point when Roles were added to Monad and friends..

as for Tuple storable instances, the theres been a number of discussions around this, and problem with having a Storable a, Storable b => Storable (a,b) instance is that theres no portable/always correct way to do the placement / alignment etc correctly that maps to the associated C Struct layout! so such an instance is fundamentally not going to wokr

@Lysxia tough, :) , i'm turning this into an action item to improve stuff. :)

cartazio commented 5 years ago

1) improving the UX for all flavors of vectors is valuable

2) benchmarks for performance characteristics across various datatypes across applicable representations are valued and please share them!

Lysxia commented 5 years ago

as for Tuple storable instances, the theres been a number of discussions around this, and problem with having a Storable a, Storable b => Storable (a,b) instance is that theres no portable/always correct way to do the placement / alignment etc correctly that maps to the associated C Struct layout!

Fair point. I wonder whether it would be worth having a clone of Storable explicitly not meant to interface with C.

cartazio commented 5 years ago

as an example of how to write your own, heres one I wrote a while ago but not suitable for inclusion in vector i think https://github.com/wellposed/numerical/blob/master/src/Numerical/Data/Vector/HPair.hs

cartazio commented 5 years ago

@Lysxia absolutely! theres some design choices to be made there, and thats gonna take some time and experimentation

cartazio commented 5 years ago

At one extreme there’s even research projects around fancy layouts in dbs like https://db.cs.cmu.edu/papers/2016/arulraj-sigmod2016.pdf (plus my own experiments in my numerical Haskell code base that predates the cmu paper ).

A big challenge is of course engineering complexity vs performance vs user facing api ux. And in some respects it’s still in general a career sized research problem to build tools that a jointly best in all three ways

choener commented 5 years ago

One performance-related comment: I use the vector library in my PrimitiveArray and ADPfusion libraries for dynamic programming. When switching between Unboxed vs Storable vectors, I have noticeable better performance with Unboxed.

My NeedlemanWunsch implementation clocks at 0.4 seconds for Unboxed, and 0.6 seconds for Storable with 'Int' as the underlying type for the elements of the vector.

I have not yet investigated why Storable is slower, and if it is my fault, or a problem of vector or Storable.

Shimuuar commented 4 years ago

I've reverted to original title. This issue mostly about comparing advantages and drawbacks of different vector types. Good deriving of Unbox instances is important and reserve its own issues (it's big topic after all)

Bodigrim commented 4 years ago

My 2p :)

Are Unboxed vectors more flexible than Primitive / Storable? IMHO they are, because there are more Unboxed instances available out of the box than Primitive / Storable instances.

Are Unboxed vectors slower than Primitive / Storable, when both instances are available from vector? No, I believe they are not, by virtue of being backed by the same structures.

So at the moment I do not see a reason to change vector documentation. Obviously, we may review it, once more efficient Storable / Primitive instances for tuples will be available.

Shimuuar commented 4 years ago

More complicated question is when we start considering records. Unboxed vectors allows for O(1) zip/unzip which is very useful when you want to work on columns of table. Also is GHC able to avoid useless reads when only part of structure is demanded: map (\x -> fieldFoo x ^ 2)?

I also think that it's worthy to investigate performance of different vector implementations. We could find quite a few surprises

cartazio commented 4 years ago

To elaborate on @bodigrims point , unboxed vectors are exactly the set of Haskell products made out of the primitive supported types of the rts. Which pritmive happens to have helpers for!

The law hiding in Primitive the package for what types should have a Prim instance boils down, spiritually, can you directly write an efficient flat memmove/memcopy / memset analogue without committing to a structure of array SOA or array of struct AOS memory layout. (Hence why for primitive I’ve had the stance that adding a “complex a” instance to Prim class

Storable vectors can’t have storable a , storable b => storable (a,b) instance because you can’t really statically decide why that would map to for a given c struct definition.

Unboxed vectors correspond with struct of array format over arbitrary Haskell prim types of instances therof.

For Lehins, and certainly in general, it would absolutely make sense to add an ARRAY OF struct memory layout that doesn’t require having a legal corresponding c struct definition. And perhaps that could live in vector or primitive, and probably would address the build time for code that he dislikes about unboxed vectors.

Conceptually, the way I’ve thought about it is that currently we have SOA unpinned arrays of arbitrary products of Haskell types ... we call this Unbox.

We also have AOS of pinned arrays that we require only have instances that at type checking time have a statically computable alignment etc, aka storable.

We could totally also have AOS of arbitrary products of Haskell types in unpinned arrays, which would be a useful new thing that would likely have some nice tradeoffs. And unlike storable allow AOS a, AOS b => AOS (a,b) Instances ! Which I think would address the build time and other concerns folks mention and just be a useful different thing.

On Thu, Jun 11, 2020 at 3:04 PM Bodigrim notifications@github.com wrote:

My 2p :)

Are Unboxed vectors more flexible than Primitive / Storable? IMHO they are, because there are more Unboxed instances available out of the box than Primitive / Storable instances.

Are Unboxed vectors slower than Primitive / Storable, when both instances are available from vector? No, I believe they are not, by virtue of being backed by the same structures.

So at the moment I do not see a reason to change vector documentation. Obviously, we may review it, once more efficient Storable / Primitive instances for tuples will be available.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/haskell/vector/issues/250#issuecomment-642873369, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABBQXZXBXZO4J3J3DCJV3RWETFPANCNFSM4H4F364A .

cartazio commented 4 years ago

To further clarify: Prim instances should be those types that have zero stride in their setting of sub fields in the respective mem* analogue operations.

On Thu, Jun 11, 2020 at 6:32 PM Carter Schonwald carter.schonwald@gmail.com wrote:

To elaborate on @bodigrims point , unboxed vectors are exactly the set of Haskell products made out of the primitive supported types of the rts. Which pritmive happens to have helpers for!

The law hiding in Primitive the package for what types should have a Prim instance boils down, spiritually, can you directly write an efficient flat memmove/memcopy / memset analogue without committing to a structure of array SOA or array of struct AOS memory layout. (Hence why for primitive I’ve had the stance that adding a “complex a” instance to Prim class

Storable vectors can’t have storable a , storable b => storable (a,b) instance because you can’t really statically decide why that would map to for a given c struct definition.

Unboxed vectors correspond with struct of array format over arbitrary Haskell prim types of instances therof.

For Lehins, and certainly in general, it would absolutely make sense to add an ARRAY OF struct memory layout that doesn’t require having a legal corresponding c struct definition. And perhaps that could live in vector or primitive, and probably would address the build time for code that he dislikes about unboxed vectors.

Conceptually, the way I’ve thought about it is that currently we have SOA unpinned arrays of arbitrary products of Haskell types ... we call this Unbox.

We also have AOS of pinned arrays that we require only have instances that at type checking time have a statically computable alignment etc, aka storable.

We could totally also have AOS of arbitrary products of Haskell types in unpinned arrays, which would be a useful new thing that would likely have some nice tradeoffs. And unlike storable allow AOS a, AOS b => AOS (a,b) Instances ! Which I think would address the build time and other concerns folks mention and just be a useful different thing.

On Thu, Jun 11, 2020 at 3:04 PM Bodigrim notifications@github.com wrote:

My 2p :)

Are Unboxed vectors more flexible than Primitive / Storable? IMHO they are, because there are more Unboxed instances available out of the box than Primitive / Storable instances.

Are Unboxed vectors slower than Primitive / Storable, when both instances are available from vector? No, I believe they are not, by virtue of being backed by the same structures.

So at the moment I do not see a reason to change vector documentation. Obviously, we may review it, once more efficient Storable / Primitive instances for tuples will be available.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/haskell/vector/issues/250#issuecomment-642873369, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABBQXZXBXZO4J3J3DCJV3RWETFPANCNFSM4H4F364A .

lehins commented 4 years ago

Unbox has benefits and drawbacks.

On the downside:

Compile time and memory usage are bad #274, but I hope one day it will get fixed
Hard to write custom instances. I have a solution that worked for me quite well: https://github.com/haskell/vector/issues/315#issuecomment-643148132
Sometimes it is not good for cache locality, so for some data types it can have lower performance then let's say for storable instance which will have all of components of individual element next to each other in memory. This one I haven't really seen to give me much trouble, so I don't particularly consider it a problem, but had to mention it anyways.
Hopefully one day we will get to adding SIMD instructions to vector, when we do, unboxed vectors will not be that useful because for efficient usage of data parallelization actual data has to be next to each other.
Storable was designed for FFI and Primitive vector, contrary to the common belief, can still be safely used with FFI in many situations, while Unboxed vectors are simply unsuitable for FFI.
I've seen primitive array perform better on rare occasions when compared to unboxed with elements of the same type in some complicated setups. This is likely due to the fact that unboxing machinery makes it a bit harder for ghc to perform the same optimizations.

On the upside:

We can have instances for product types, which Prim cannot allow by itself and Storable does not provide by default. (I don't understand why base does not provide such tuple instances, to me it seems like an oversight. What Carter mentioned about alignment and associated C struct is something I don't agree with, but that is not the point of this issue so let's not worry about it)
Unlike Storable, both Unboxed and Primitive use unpinned memory for small allocations under ~3.3K which is great for avoiding memory fragmentation. But when large chunks of memory are allocated we can avoiding expensive copies during GC, because then they are allocated as pinned. In other words we get the best of both worlds.
Theoretically you will index only one array behind the Vector (a, b) when you just need a or b, but not both, although I am not sure it actually works that way in practice.
Has slightly lower memory overhead than Storable, because ForeignPtr besides the pointer actual Addr# also carries around the MutableByteArray# in a sum type. This can only be noticeable for very small vectors, which Storable shouldn't be used for anyways.

Above points are things that I learned about differences about vector just of top of my head. The conclusion I can say about all this is that there is not one best default array type, so suggesting "use unboxed" whenever you can is not the right one, they all have their uses. Which means we just need to document it better. What my suggestion would be:

Use primitive vector for data types that have Prim instance.
Use storable if you need FFI
Use unboxed or storable if you have more complicated types that don't have Prim instance, but you have instances available of either Unbox or Storable, with preference that depends on the use cases
Use boxed when none of above are applicable.

cartazio commented 4 years ago

I can do a starter demonstrator PR for the AOS unpinned vector IF other folks would be interested in helping get it to a mature point? I think that would likely be a useful thing. Idk.

On Fri, Jun 12, 2020 at 4:36 AM Alexey Kuleshevich notifications@github.com wrote:

Unbox has benefits and drawbacks.

On the downside:

Compile time and memory usage are bad #274 https://github.com/haskell/vector/issues/274, but I hope one day it will get fixed

Hard to write custom instances. I have a solution that worked for me quite well: #315 (comment) https://github.com/haskell/vector/issues/315#issuecomment-643148132

Sometimes it is not good for cache locality, so for some data types it can have lower performance then let's say for storable instance which will have all of components of individual element next to each other in memory. This one I haven't really seen to give me much trouble, so I don't particularly consider it a problem, but had to mention it anyways.

Hopefully one day we will get to adding SIMD instructions to vector, when we do, unboxed vectors will not be that useful because for efficient usage of data parallelization actual data has to be next to each other.

Storable was designed for FFI and Primitive vector, contrary to the common belief, can still be safely used with FFI in many situations, while Unboxed vectors are simply unsuitable for FFI.

I've seen primitive array perform better on rare occasions when compared to unboxed with elements of the same type in some complicated setups. This is likely due to the fact that unboxing machinery makes it a bit harder for ghc to perform the same optimizations.

On the upside:

We can have instances for product types, which Prim cannot allow by itself and Storable does not provide by default. (I don't understand why base does not provide such tuple instances, to me it seems like an oversight. What Carter mentioned about alignment and associated C struct is something I don't agree with, but that is not the point of this issue so let's not worry about it)

Unlike Storable, both Unboxed and Primitive use unpinned memory for small allocations under ~3.3K which is great for avoiding memory fragmentation. But when large chunks of memory are allocated we can avoiding expensive copies during GC, because then they are allocated as pinned. In other words we get the best of both worlds.

Theoretically you will index only one array behind the Vector (a, b) when you just need a or b, but not both, although I am not sure it actually works that way in practice.

Has slightly lower memory overhead than Storable, because ForeignPtr besides the pointer actual Addr# also carries around the MutableByteArray# in a sum type. This can only be noticeable for very small vectors, which Storable shouldn't be used for anyways.

Above points are things that I learned about differences about vector just of top of my head. The conclusion I can say about all this is that there is not one best default array type, so suggesting "use unboxed" whenever you can is not the right one, they all have their uses. Which means we just need to document it better. What my suggestion would be:

Use primitive vector for data types that have Prim instance.

Use storable if you need FFI

Use unboxed or storable if you have more complicated types that don't have Prim instance, but you have instances available of either Unbox or Storable, with preference that depends on the use cases

Use boxed when none of above are applicable.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/haskell/vector/issues/250#issuecomment-643149635, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABBQRX6RPMYB2C4MUMKP3RWHSKVANCNFSM4H4F364A .

lehins commented 4 years ago

@cartazio Wouldn't such a PR would have to go into primitive?

cartazio commented 4 years ago

Perhaps! It might still be good to workshop it within vector first to make sure it’s good for vector, then split it out into prs into both once we have a good interface then everyone can help flesh it out once there’s a nice demonstrator.

One possible refinement that maybe could be done later or perhaps best at the same time would be that it would be possible to define safe instances for sums as tagged unions! Aka

AOS a , AOS b => AOS Either a b.

Unlike in UNBOX, this wouldn’t have the undefinedness issue of one of the vectors holding an ambiguous value unrelated to the current tag, just a trailing suffix we could define as being zeroed

One issue of course would be that the sum eats up space equal to the max of size of rep of a , size of rep of b. Plus the tag bits! So in some extreme cases it would use way more space than vanilla boxed vectors. Which is fine! Another angle is that nested sums won’t use tagged bits efficiently, in general that would require instances all be sum of product representations. So maybe the proof of concept would have a generic deriving / default rep that requires types have a kosher sum of products rep. Which seems like a reasonable requirement.

On Fri, Jun 12, 2020 at 10:08 AM Alexey Kuleshevich < notifications@github.com> wrote:

@cartazio https://github.com/cartazio Wouldn't such a PR would have to go into primitive?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/haskell/vector/issues/250#issuecomment-643289376, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABBQTIPWAXPMX4PXFVQG3RWIZETANCNFSM4H4F364A .

lehins commented 4 years ago

@cartazio I see what you mean and I am definitely sure that vector is not the place for this. What I would do is adjust Prim class to support this sort of unboxing. In the matter of fact I already have a a perfectly working and tested implementation for this stuff in a closed source repository, which I plan to open up and release on Hackage hopefully later on this summer. Spoiler alert: unboxing Maybe and Either in a way you described gives tremendous improvements in speed, which is definitely worth wasting this extra space.

The problem with this sort of unboxing though is that the primops that are needed for achieving this are only available starting with GHC 8.6 (readWord8ArrayAsDouble# ...), which means if we want it for unpinned array we have to loose support for all those older ghc that people love so much.

The reason why I am saying that vector is not the right place for this functionality is because:

there are already too many Vector types in vector, adding a new one AOS is no go.
We need to support old GHCs which will not allow us this sort of unboxing

What I can recommend though, if you are keen on working on this creating a separate package that creates a new vector type in the same way that we already Boxed, Unboxed, Primitive, and Storable

cartazio commented 4 years ago

I’m pretty sure we don’t need that primop you refer to ... what makes you think we do?

Yeah, the sop approach doesn’t make sense in terms of additional maintainer complexity for vector. Just likely a valuable ecosystem bit.

On Fri, Jun 12, 2020 at 10:43 AM Alexey Kuleshevich < notifications@github.com> wrote:

@cartazio https://github.com/cartazio I see what you mean and I am definitely sure that vector is not the place for this. What I would do is adjust Prim class to support this sort of unboxing. In the matter of fact I already have a a perfectly working and tested implementation for this stuff in a closed source repository, which I plan to open up and release on Hackage hopefully later on this summer. Spoiler alert: unboxing Maybe and Either in a way you described gives tremendous improvements in speed, which is definitely worth wasting this extra space.

The problem with this sort of unboxing though is that the primops that are needed for achieving this are only available starting with GHC 8.6 ( readWord8ArrayAsDouble# ...), which means if we want it for unpinned array we have to loose support for all those older ghc that people love so much.

The reason why I am saying that vector is not the right place for this functionality is because:

there are already too many Vector types in vector, adding a new one AOS is no go.

We need to support old GHCs which will not allow us this sort of unboxing

What I can recommend though, if you are keen on working on this creating a separate package that creates a new vector type in the same way that we already Boxed, Unboxed, Primitive, and Storable

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/haskell/vector/issues/250#issuecomment-643308745, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABBQTHVBUYNPW2UZKRBV3RWI5HRANCNFSM4H4F364A .

lehins commented 4 years ago

I’m pretty sure we don’t need that primop you refer to ... what makes you think we do?

If want to read/write from/into a ByteArray anything that is larger than one byte, eg. Word64 at any location, which is what you would need if you wanted to unbox Either Word64 Word32, you would have to reserve a byte for a flag (ie. Left/Right) and 8 bytes for content, then your second element of the array will start at 9th byte and the actual Word64 or Word32 would be written at 10th byte. Now tell me, which primop would you use to write into an unpinned ByteArray# a Word64 value at the offset of 10 bytes?

cartazio commented 4 years ago

From an optimality of rep point of view you are absolutely correct. One option I may choose is to just have the encoding be different / sloppier on older ghcs. This is a great point and I’ll think about it as I’m travelling with family this afternoon.

On Fri, Jun 12, 2020 at 11:29 AM Alexey Kuleshevich < notifications@github.com> wrote:

I’m pretty sure we don’t need that primop you refer to ... what makes you think we do?

If want to read/write from/into a ByteArray anything that is larger than one byte, eg. Word64 at any location, which is what you would need if you wanted to unbox Either Word64 Word32, you would have to reserve a byte for a flag (ie. Left/Right) and 8 bytes for content, then your second element of the array will start at 9th byte and the actual Word64 or Word32 would be written at 10th byte. Now tell me, which primop would you use to write into an unpinned ByteArray# a Word64 value at the offset of 10 bytes?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/haskell/vector/issues/250#issuecomment-643335785, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABBQXV2UFDUUHB5JWSJG3RWJCWHANCNFSM4H4F364A .

Bodigrim commented 4 years ago

For an MVP one can possibly spend a full word on a tag.

cartazio commented 4 years ago

Absolutely :)

On Sun, Jun 14, 2020 at 12:35 PM Bodigrim notifications@github.com wrote:

For an MVP one can possibly spend a full word on a tag.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/haskell/vector/issues/250#issuecomment-643790625, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAABBQSY73TVF62PZKSG5RLRWT34NANCNFSM4H4F364A .

nh2 commented 3 years ago

adding SIMD instructions to vector, when we do, unboxed vectors will not be that useful because for efficient usage of data parallelization actual data has to be next to each other.

@lehins This one surprises me a bit. Isn't it that for most SIMD, SoA is preferrable?

For example:

https://en.wikipedia.org/wiki/AoS_and_SoA

The motivation is easier manipulation with packed SIMD instructions in most instruction set architectures, since a single SIMD register can load homogeneous data, possibly transferred by a wide internal datapath
https://stackoverflow.com/questions/61310124/simd-instructions-on-custom-data-types
https://software.intel.com/sites/default/files/article/392271/aos-to-soa-optimizations-using-iterative-closest-point-mini-app.pdf

it is almost always better (faster) to vectorize with SoA “stride-1” data layout than the AoS data layout

cartazio commented 3 years ago

@nh2 is correct

lehins commented 3 years ago

@nh2 is always correct :smile: but not this time :wink:

Unboxed vectors in Haskell are useless with SIMD precisely because data is not next to each other. Let me explain.

Lets say we have a data type

data Point = Point { x :: Double, y :: Double}

A vector of points is unboxed as two vectors of doubles:

Vector Point => (Vector Double, Vector Double)

All is great because normally that is exactlywhat we want with SIMD. Let's say we want to add two vectors of points:

(a  :: Vector Point) + (b :: Vector Point) = (c :: Vector Point)

what we need is x component of a added to x component of b and stored in x of c and the same for y component. And AoS allows to treat x and y separate and thus makes parallelization possible. However in Haskell in order to perform an operation on a Point we have to construct a Point type first, which completely destroys this ability to parallelize because we can no longer treat x and y independently. In other words if you want to use SIMD and SoA you'll have to use separate vectors of double for x and y components yourself, because then you can treat them separately and you don't have to construct a Point when indexing a vector.

In other words what we want is:

a[0].x + b[0].x
a[1].x + b[1].x
...
a[n].x + b[n].x
---
a[0].y + b[0].y
a[1].y + b[1].y
...
a[n].y + b[n].y

where + is parallelized with SIMD, but unboxing technique in vector forces us to do this:

Point (a[0].x) (a[0].y) + Point (b[0].x) (b[0].y)
Point (a[1].x) (a[1].y) + Point (b[1].x) (b[1].y)
...
Point (a[n].x) (a[n].y) + Point (b[n].x) (b[n].y)

simply because we can not operate on a Point without constructing it first.

Now, if x and y components were stored one after another in memory then we would be able to use SIMD efficiently in the above scenario.

@nh2 is always correct :smile:

So, joking aside you are totally right that SoA is much better for parallelization and that is exactly the approach I would recommend too. But unfortunately the safety of Haskell type system makes this storage layout useless for Unboxed vectors in vector package.

cartazio commented 3 years ago

i think the intended meaning is "the layout lets you hand write the correct instances" rather than "automatic simd", which we are very far from ever having

cartazio commented 3 years ago

the point of vector, or perhaps (for reasons of stability whatever comes next thats not vector), is to "drive" providing better fusion/optimization machinery in ghc. if thats no longer vector, thats fine too. That we cant do the right "through the constructor" homomorphism MEANS its an opportunity to make ghc better for these sorts of work, not to say its not practical / impossible

cartazio commented 3 years ago

I’ll try to share a grounded technical walk through of what I mean and how it would work sometime on the next few days. I’m a tad fried from a really busy work week and if that came across coarsely it’s just me feeling, in my own head, that my reasoning is obvious .

I do hope folks can engage in poking at ambiguous parts of said exposition when I share it. Again because What’s obvious is different to different folks

cartazio commented 3 years ago

so, the short version is: whats missing to do the right "flat" Struct of Array SIMD in user space with vector is the following representation mapping being exposed (perhaps in a .unsafe/.internal interface):

using the types in (see link for the associated vector instances )https://github.com/wellposed/numerical/blob/6b458232760b20674487bd9f8442b0991ce59423/src/Numerical/Data/Vector/HPair.hs#L61-L72

--- core defs from that file 
--- we  use Hprod a , where a :: *->* 
--- because in the context i wrote it, i was mixing having leaves that were various different base vectors, and theres another -- example in the same directory that users data families rather than gadts, they have surprising trade off
data HProd a  where
    HPair :: HProd a-> HProd a  -> HProd a
    HUnit :: a -> HProd a

data  VHProd  (prd:: HProd ( * -> * )) val where
    VHLeaf ::  !(v a) -> VHProd   ('HUnit v) a
    VHNode  :: !(VHProd  pra a) -> !(VHProd  prb b ) ->VHProd  ('HPair  pra prb) (a,b)

data  MVHProd   (prd:: HProd (* -> * -> *) ) (st :: * ) val where
    MVHLeaf :: !(mv  st a) -> MVHProd   ('HUnit mv) st  a
    MVHNode  :: !(MVHProd pra st a) -> !(MVHProd   prb   st b ) -> MVHProd  ('HPair pra prb) st (a,b)

then you write something like these

type family ProductOfPrimVect ::  ( a : *)->   HProd Prim.Vector (PrimProductRep a )
type family PrimProductRep :: * -> * --- this would be like a ghc generic rep, but only for product types,
--- and should match the tuple associativity of how unboxed would be paired up in this rep 
unboxedExplode :: Unboxed.Vector a -> VHProd (ProductOfPrimVect a) (ProductRep a)

then what a person using simd could do is "pick out" the right Prim vectors that are packed up in the underlying unboxed vector and do the SOA operations thereof. no constructor barrier

i was actually working out a plan that would lay the ground work for this in primitive, but well, the past year made being motivated to do the supportring work a tad hard motivation wise .

theres def some fuzzy bits in this sketch, but the point being, subject to having a strong contract about what Prim instances are legal if you dont wrap them in some sort of new type that annotates "what layout youre choosing" for composite instances, you can actually make this a pretty save to expose "unsafe" rep to build SOA primops.

please ask what parts of this are vague (and do look at the HPair.hs and Pair .hs files in the associated repo i linked)

edit: to be clear: this explode would need to be part of writing instances for unboxed vectors, but theres ways to make that work without hurting / breaking users

edit2: or we just export in a .internal all the constructors for Unboxed vector data family instances and users can do this themselves (do we do that yet?)

Boarders commented 3 years ago

@lehins Sorry if I am being daft but I don't see how the point constructor enters into the construction of an unboxed vector whatsoever. Why is that relevant here?

lehins commented 3 years ago

@cartazio I have no idea what you are proposing here. Some day we'll get SIMD support in GHC then you can submit a PR with this functionality that you are suggesting.

Sorry @Boarders I currently don't have much time to explain this in more detail, I guess the best suggestion I have for you that could clear it up is for you to try implementing the example I presented using either the SIMD primops instructions that use LLVM or intrinsic from gcc on the C side over FFI then you will see the problem right away. Until I tried implementing it myself it was hard for me to see the problem too.

One way or another this is not the ticket that discusses SIMD, it is about calling Unboxed vectors default. So if anyone would like to discuss it further here is a gitter that can be used for such discussions: https://gitter.im/haskell/vector Once we do get proper support for SIMD in GHC, please open a new ticket about it.

nh2 commented 3 years ago

Has slightly lower memory overhead than Storable, because ForeignPtr besides the pointer actual Addr# also carries around the MutableByteArray# in a sum type. This can only be noticeable for very small vectors, which Storable shouldn't be used for anyways.

For the curious, I just measured the memory overhead of Unboxed vs Storable, and found:

Data.Vector.Storable.Vector Int has 64 Bytes overhead
Data.Vector.Unboxed.Vector Int has 48 Bytes overhead.

haskell / vector