FP64 tan_u10 is inaccurate for small values

The current Sleef_tan_u10 function has maximum error slightly above 1.0 ulp for some inputs |x| < 0x1p-1021.

The entry-points for which the error has been reproduced are:

Sleef_tan_u10
Sleef_tand1_u10
Sleef_tand1_u10purec
Sleef_tand1_u10purecfma
Sleef_tand2_u10
Sleef_tand2_u10sse2
Sleef_tand2_u10sse4
Sleef_tand2_u10avx2128
Sleef_tand4_u10avx2
Sleef_tand8_u10avx512f

The attached tan-test.zip contains a reproducer.

Output of the reproducer:

tan_0(0x0.0000000b91e71p-1022) = 0x0.0000000b91e70p-1022
tan_1(0x0.0000000b91e71p-1022) = 0x0.0000000b91e70p-1022
tan_2(0x0.0000000b91e71p-1022) = 0x0.0000000b91e70p-1022
tan_3(0x0.0000000b91e71p-1022) = 0x0.0000000b91e70p-1022
tan_4(0x0.0000000b91e71p-1022) = 0x0.0000000b91e70p-1022
tan_5(0x0.0000000b91e71p-1022) = 0x0.0000000b91e70p-1022
tan_6(0x0.0000000b91e71p-1022) = 0x0.0000000b91e70p-1022
tan_7(0x0.0000000b91e71p-1022) = 0x0.0000000b91e70p-1022
tan_8(0x0.0000000b91e71p-1022) = 0x0.0000000b91e70p-1022
tan_9(0x0.0000000b91e71p-1022) = 0x0.0000000b91e70p-1022
MPFR faithful results = 0x0.0000000b91e71p-1022, 0x0.0000000b91e72p-1022

For this example, Sleef_tan_u10 returns a result that is rounded down from the input, whereas the infinitely-precise result is very slightly above x, representing an error very slightly greater than 1.0 ulp.

For small values the algorithm employed by Sleef_tan_u10 reduces to return 2 * (0.5 * x);, which will produce an unfaithful result for any |x| < 0x1p-1021 whose least-significant mantissa bit is set.

shibatch / sleef

FP64 tan_u10 is inaccurate for small values #465