Closed PatDubbie closed 1 month ago
IMO this is probably not worth it. For simple operations like this, a for loop is as fast as you can go because LLVM will be able to vectorize the naive versions perfectly, and all of the time will be spent moving memory around.
If I'm mistaken, please point me in the right direction: with proper alignment, wouldn't it be possible to take advantage of SIMD for exactly these operations? Or was there a conscientious choice for these structs to not be 16-byte aligned?
added in 0.19.2
It seems there aren't implementations for binary operators (add/sub/mul/div) between scalars and vectors/matrices? I may be missing something