kunpengcompute / AvxToNeon

Encapsulate the frequently used AVX instructions as independent modules to reduce repeated development workload.
Apache License 2.0
114 stars 41 forks source link

Fix the '-0.0' issue of blendv_ps and blendv_pd. #7

Closed lowintelligence closed 4 years ago

lowintelligence commented 4 years ago

For '_mm256_blendv_ps' and '_mm256_blendv_pd', the selector is controlled by the highest flag of each unit, while the mask is presented as a float or double SIMD register. For the case of '-0.0', which is intepreted as 0x8000..00 in register, digits in the 'b' input should be selected. Previous implementation used 'vcgeq_f32' and 'vcgeq_f64' to get flags of selector. These intrinsics won't think '-0.0 < 0.0', thus digits in 'a' should be selected and an incorrect result would be returned. This fix convert the comparing from float intrinsics to integer ones and preserve the correction of '-0.0' case.