Do not unconditionally use _mm_alloc / _mm_free, which are x64 specific
Do not unconditionally use xmmintrin.h
Properly split namespace declaration between header and implementation. Otherwise #include and actually puts everything into namespace, which leads to much confusion on Apple Clang