[x] Replace normal approx with SSE version in SSE matrix classes - Be aware use >= or <= if necessary since < or > might fail at the last register if rows % numRegisterentries !=0. Reason: All unused registers should be 0. Unused elements should never contribute to the comparison
[x] MAYBE add static functions to set default parameters Postponed. Not necessary at the moment
[x] Add exceptions in ctor of InsideTolerance class and test them (Didn't want to start a new issue)
Bitmasks for register abs can be used similar to this approach.
Example code for SSE comparison:
__m128 a = _mm_set_ps(1.f, 2.f, 3.f, 4.f);
__m128 b = _mm_set_ps(1.f, 2.f, 3.f, 4.f);
__m128i res = reinterpret_cast<__m128i>(_mm_cmpeq_ps(a, b));
for (U32 i = 0; i < 4; ++i)
std::cout << "element " << i << " equal: " << res[i] << std::endl;
U16 mask = _mm_movemask_epi8(res);
std::cout << "mask: " << mask << std::endl;
if (mask == 0xffff)
std::cout << "All values equal" << std::endl;
else if (mask != 0)
std::cout << "Some values equal" << std::endl;
else
std::cout << "No values equal" << std::endl;
std::cout << std::endl << std::endl << "Test fot __m256 " << std::endl;
__m256 c = _mm256_set_ps(1., 2., 3., 4., 5., 6., 7., 8.);
__m256 d = _mm256_set_ps(1., 2., 3., 4., 5., 6., 7., 8.);
__m256i res256 = reinterpret_cast<__m256i>(_mm256_cmp_ps(c, d, _CMP_EQ_OS));
for (U32 i = 0; i < 8; ++i)
std::cout << "element " << i << " equal: " << res256[i] << std::endl;
U32 mask256 = _mm256_movemask_epi8(res256);
std::cout << "mask: " << mask256 << std::endl;
if (mask256 == 0xffffffff)
std::cout << "All values equal" << std::endl;
else if (mask256 != 0)
std::cout << "Some values equal" << std::endl;
else
std::cout << "No values equal" << std::endl;
Replace normal approx with SSE version in SSE matrix classes - Be aware use >= or <= if necessary since < or > might fail at the last register ifUnused elements should never contribute to the comparisonrows % numRegisterentries !=0
. Reason: All unused registers should be 0.MAYBE add static functions to set default parametersPostponed. Not necessary at the momentBitmasks for register abs can be used similar to this approach.
Example code for SSE comparison:
source: https://stackoverflow.com/questions/6042399/how-to-compare-m128-types