nessos / Streams

A lightweight F#/C# library for efficient functional-style pipelines on streams of data.
http://nessos.github.io/Streams/
Other
382 stars 46 forks source link

Adding SIMD support #48

Open jackmott opened 7 years ago

jackmott commented 7 years ago

I'm been going through the Streams code trying to figure out how to add SIMD support. For example this SIMD enhanced fold performs very well compared to an inlined version of the core library fold.

   static member inline SIMDFold folder combiner (start:'T) (values : 'T[]) =        
        let mutable i = 0;
        let mutable v = Vector<'T>(start)
        while i < values.Length - Vector<'T>.Count do            
            v <- folder v (Vector<'T>(values,i))
            i <- i + Vector<'T>.Count
        i <- 0
        let mutable result = start        
        while i < Vector<'T>.Count do
            result <- combiner result v.[i]
            i <- i+1
        result

Adding support to streams has a few considerations:

Ideally, say we had an Array of floats - values We would want to be able to do something like

values
|> SIMDStream.simdMap (fun e -> e*e)  //operations on Vector<float>s
|> SIMDStream.map (fun e -> if (e < 5) then 0 else 3)  //scalar operations on floats
|> SIMDStream.simdSum

This would people could mix and match SIMD operations with scalar ones as sometimes has to be done.

I'm going through the streams code trying to see how this could be done, and it isn't entirely clear. Some parts of the composed function would need to operator on a Vector while others would need to iterate Vector.Count times to operate on individual elements of the array?

If there is interest in this I'd love to contribute but need some guidance as I don't fully understand how the streams work yet.

palladin commented 7 years ago

Hi Jack, It is certainly exciting to have vectorized streams, but with the current design I don't think that it is possible. Something like array |>Stream.ofArray |>Stream.filter (fun ...) |> Stream.simdfold is definitely problematic. Plus we need to have perfect stream fusion or else the virtual calls will dominate perf wise. One possible direction for vectorized loops is to have a new Streams library targeting perfect stream fusion and then just compile with .net native and hope that the C++ backend will vectorize our loop.

jackmott commented 7 years ago

I see. It is too bad that RyuJIT doesn't do any automatic vectorizing, it would be quite nice to get the same kind of auto vectorizing that C compilers do, but at runtime so it can target the available instruction sets.