Closed atondwal closed 1 year ago
In practice looks like this is too slow for some users. For the 64-bit version we're fine with the performance hit of the additional select, but for the 32-bit version maybe we should just clamp it? What do you think, @jakevdp?
I think @rmlarsen is taking a look
I probably won't get to it this week, though.
FWIW: The originally bug report is incorrect. The erc implementation is indeed very accurate, but fails to clamp the output to [-1:1].
I'll send a CL with the clamping.
Thanks for updating the bug @atondwal . FYI: I'm working on another change that will slightly improve performance and accuracy of erf().
see openxla/stablehlo#1238