Closed sharkfox closed 6 years ago
Enrico, hi, thanks for reporting. I will investigate and fix it for sure.
Hi Victoria, thanks for your investigation. Do you have any updates on that? I reviewed the function again and didn’t get the trick when doing the comparisons with the magic pattern. I would have expected a simple range check to do the clipping instead.
HI, Enrico, please be patient. WIP. I hope to commit an updated function this week.
Okay, don't worry. Take your time. I was just wondering about the status and whether the behavior itself was acknowledged or not.
Please see the single line fix for your problem in today's commit. As for the problem itself - 0) yes, it could be seen on various compilers 1)while I don't understand its exact reason I assume it is related with the cast to float->compare ps functions sequence. For AVX instructions set compare ps functions are implemented totally different - they have some special NAN treatment and probably the SSE compare versions are very sensitive to that in advance . 2) I couldn't guarantee all functions are this problem free, will continue my investigation and fixes there. 3)As for the algorithm itself - range check is not enough because we need to round towards zero.
Thanks for your patch. I'm currently checking this against my test cases. At first glance I see some trouble with corner cases. I'll get back to you after a deeper investigation...
I've set up a few test cases to check the SSE implementation against a Cortex-A9 target. I figured that the corner cases are not treated correctly. Are you sure your constant is right? It seems to cause that large U32 vectors will be truncated to SINT32_MAX which is too restrictive. Also the vcvt_s32_f32 function seems to have an off-by-one error and a problem with numbers that exceed the range. Please find my results below.
vcvt_u32_f32({-6000000000.0f, 6000000000.0f}) = { 0, 2147483647} // should be { 0, 4294967295}
vcvt_u32_f32({-3000000000.0f, 3000000000.0f}) = { 0, 2147483647} // should be { 0, 3000000000}
vcvt_s32_f32({-6000000000.0f, 6000000000.0f}) = {-2147483648, -2147483648} // should be {-2147483648, 2147483647}
vcvt_s32_f32({ 6000000000.0f, 0.0f}) = {-2147483648, 0} // should be { 2147483647, 0}
You are absolutely right. My fix is stupid, working on a new one. The only thing that justifies me is that it looks like x86 vcvt_s32_f32 native function gives wrong result @ corner cases by itself. To keep you posted on my progress there. Thanks!
Hi, sharkfox, could you please check the latest version. It gives right results for all corner cases except for some very small precision issue I believe we could tolerate.
Hi Victoria, thanks for your update. Looks better now. Indeed, the vcvtq_u32_f32 shows an off-by-one error for some (big) values which is not perfect but probably tolerable as you wrote.
However, it seems that the limits.h file has to be included. I could not compile otherwise as the UINT_MAX declaration was missing in case of vs2015. Did you get this by accident in your test environment? The rest was fine.
Hi Victoria,
I noticed that the vcvtq_u32_f32 function throws a floating pointing exception when running in VS2015 which seems to be strange to me. Please find the demo code attached.
It seems that inside the function a comparison with a constant is done. But when looking at it using the debugger the pattern is NaN and this causes an exception to happen if you have them enabled like I do in my program. It would be cleaner if the implementation does not throw the exception as it always interrupts the program regardless of the input to the function.
Could you please have a look at this?
Best regards, Enrico
vcvt-exception.zip