tensorflow / mlir-hlo

398 stars 70 forks source link

32-bit erf is inaccurate for x > 1 #62

Closed atondwal closed 1 year ago

atondwal commented 1 year ago

see openxla/stablehlo#1238

atondwal commented 1 year ago

fixed by https://github.com/tensorflow/mlir-hlo/commit/3ad5e21dfce9819f30194e33933a6a225eb66f8d

atondwal commented 1 year ago

In practice looks like this is too slow for some users. For the 64-bit version we're fine with the performance hit of the additional select, but for the 32-bit version maybe we should just clamp it? What do you think, @jakevdp?

atondwal commented 1 year ago

I think @rmlarsen is taking a look

rmlarsen commented 1 year ago

I probably won't get to it this week, though.

rmlarsen commented 1 year ago

FWIW: The originally bug report is incorrect. The erc implementation is indeed very accurate, but fails to clamp the output to [-1:1].

rmlarsen commented 1 year ago

I'll send a CL with the clamping.

atondwal commented 1 year ago

(fixed by https://github.com/tensorflow/mlir-hlo/commit/29e975a3c0a469c41fc564c36766217aef9d07a4)

rmlarsen commented 1 year ago

Thanks for updating the bug @atondwal . FYI: I'm working on another change that will slightly improve performance and accuracy of erf().