herbie-fp / herbie

Optimize floating-point expressions for accuracy
https://herbie.uwplse.org
Other
763 stars 32 forks source link

2x is not a '99%' approximation for 1/(1-x) - 1/(1+x) around 0 #742

Open dmcooke opened 6 months ago

dmcooke commented 6 months ago

On the web interface, try the expression '1/(1-x) - 1/(1+x)' for a range [0, 0.5] (I'll call this f(x)). Herbie will find some alternatives, including '2*x', which it claims is '99%' accurate. While it's a good approximation near 0, it's worse over most of the range. The same problem occurs for a range bracketing 0, such as [-0.5, 0.5].

E.g., f(0.5) = 1.3333, and 2*0.5=1, for a relative error of ~30%.

It appears that the very small floating-point numbers are drastically over-represented in the sample, making Herbie care much more about fitting to [1e-308, 1e-10], than to [1e-10, 0.5]. In that regime a low-order Taylor expansion is pretty much always going to win.

I can see how this may be ok for large ranges, eg. [2, 1e308]: a lot more fp values are in [1e10, 1e308] than [2, 1e10], but the range is also much wider. The reciprocal situation is rather different: [1e-308, 1e-10] is literally miniscule.

AFAICT, Herbie isn't attempting to divide the range up like it does for large ranges. E.g. if x<1e-6 then 2*x else f(x) would be a good approximation. (Herbie will divide if I restrict the interval to [1e-9, 0.5], for example.)

Suggestions:

  1. If the interval contains 0, Herbie should probably include a warning that the accuracy may be ... inaccurate.
  2. Perform a second accuracy calculation with points sampled uniformly from the interval (that is, choosing a value in the subinterval [x, x+δ] is as probable as the subinterval [y, y+δ]).

Since I spent some time on this, here's some miscellaneous results:

pavpanchekha commented 6 months ago

Hi @dmcooke, thanks for the bug report. You basically figured out yourself why this is happening: Herbie takes the range [0, 1] to mean "uniform sampling over floating-point-representable values", which includes a lot of points in [1e-308, 1e-10] and comparatively fewer (typically a handful) in [1e-10, 1].

I'm thinking of fixing this by having a "minimum absolute value" field pop up whenever the input range crosses zero; that way it's at least clear to the user that this question is worth thinking about. We've tried mixing uniform and by-representation sampling in the past, and we've found that it's hard to avoid confusing behavior for, for example, wide ranges.

Also thank you for the input and the 2 / (1/x - x) solution, we'll add that to Herbie's test suite. Herbie's not great at rearranging polynomials, so I'm not too surprised it isn't found, but Herbie should at least be able to get 2 x / (x + 1) (x - 1). Sollya support is interesting—it is certainly the state of the art tool for polynomial fitting, much better than the Taylor series used by Herbie—but it's a little hard to see how we can fit it into Herbie's architecture, where we want to not assume we know the right input range (as in, we want to reserve the right add if statements later).

oscardssmith commented 3 months ago

One thing that might be reasonable would be to switch Herbie's semantics to be an equal weighting of uniform sampling of floats and uniform sampling of real numbers converted to their nearest float. I think this is a much more reasonable intuition for what users expect from a "good approximation"

pavpanchekha commented 3 months ago

Yeah, I think using a uniform sampling, or a mix, would help in this case, but I've seen a number of cases where it would instead hurt (basically, if the range is large, like -1e3...1e3, it biases toward only large values). I think the current thinking is:

oscardssmith commented 3 months ago

Right, that's why I was suggesting sampling according to the sum of union of a uniform and exponential distribution. Another potentially good option would be to always sample at the edges of the provided range. That would make sure that even in the absence of enough samples to explore the space well, it's not doing something too egregiously wrong.