The max16 macro function used in ssw.c uses _mm_extract_epi16 to extract the maximum number.
However we only want 8 bits and there's no guarantee that the neighbouring lane is all zero. Using epi16 may cause more overflows and result in some performance loss.
I think change it's better to use _mm_extract_epi8.
The max16 macro function used in ssw.c uses _mm_extract_epi16 to extract the maximum number.
However we only want 8 bits and there's no guarantee that the neighbouring lane is all zero. Using epi16 may cause more overflows and result in some performance loss.
I think change it's better to use _mm_extract_epi8.