vhirtham / GDL

Game Development Library
2 stars 0 forks source link

HorizontalSum and Transpose headers #42

Closed vhirtham closed 4 years ago

vhirtham commented 4 years ago

Add TransposePerLane function. It can call the standard transpose functions but the test needs to make sure, that both lanes are transposed the same way. This can be used to calculate multiple horizontal sums faster by calculating the sum per lane first.

Keep in mind: For rectangular matrices with more rows than columns, the horizontal sum can often be calculated faster than transposing and adding the results.

Actually, transposing first and then adding the results is slower than a "swizzle -> add -> swizzle -> add -> ..." approach. The necessary number of additions stays the same, but the necessary number of ow swizzle operations decreases more and more with increasing matrix size.