google / s2geometry

Computational geometry and spatial indexing on the sphere
http://s2geometry.io/
Apache License 2.0
2.32k stars 308 forks source link

consider using Eriksson’s formula in preference to L’Huilier’s for the area of a spherical triangle #190

Open jrus opened 3 years ago

jrus commented 3 years ago

Currently S2 calculates the signed area E of an oriented spherical triangle with vertices at unit vectors [a, b, c] by first finding the measures of the central angles between the vectors, then applying an angle-measure based formula (the spherical analog of Heron’s formula in the plane). This involves a bunch of expensive circular functions, and has many corners for rounding errors to sneak into the calculation.

If we expand the current computation it might be written:

α = atan2( √[(b × c) · (b × c)], b · c )
β = atan2( √[(c × a) · (c × a)], c · a )
γ = atan2( √[(a × b) · (a × b)], a · b )
σ = ½(α + β + γ)

s = sgn(a · (b × c)) E = 4s atan( √[ tan½σ tan½(σα) tan½(σβ) tan½(σγ) ] )

Instead, consider using a formula from
Euler (1781) “De mensura angulorum solidorum” (non-vector variant)
Van Oosterom & Strackee (1983) “The solid angle of a plane triangle”. IEEE transactions on Biomedical Engineering 2: 125–126. Eriksson (1990) “On the Measure of Solid Angles”. Mathematics Magazine, 63(3), 184–187

E = 2 atan2(a · (b × c), 1 + a·b + b·c + c·a) (or) E = 2 atan2(a · ((ba) × (ca)), (b + a) · (c + a))

Notice that the numerator of the tangent needs to be calculated anyway to get the sign in the current version (it is the standard orientation predicate for a triple of unit vectors). The denominator takes 2 vector-vector additions and 1 dot product.

Overall, we cut the number of circular function evaluations from 8 down to 1, cut the dot products from 7 to 2, cut the cross products from 4 to 1, and cut square roots from 4 to 0.

The second line above avoids loss of significance in the typical case where triangle vertices are close together by computing the differences between vectors up front. If even more worried about possible precision loss for small skinny triangles, we could even replace (ba) × (ca) with (½(b + c) − a) × (cb) or similar, as is done in RobustCrossProduct.


I don’t know that anyone has carefully worked through the error analysis of this formula, or tried to figure out if it needs special touch-ups in extreme situations (e.g. a almost antipodes to b), but I would expect it to be a significantly easier thing to study than the L’Huilier formula, as it sticks to basic floating point arithmetic.

jmr commented 3 years ago

@ericveach

ericveach commented 3 years ago

I had not seen this formula before, and I do think it might be a good substitute for l'Huilier's theorem. The error should be lower in the case of long skinny triangles, for example, as long as the denominator (b + a) · (c + a) is bounded away from zero (i.e., as long as the triangle does not contain nearly-antipodal vertices).

However it's worth pointing out that we will still need to fall back to Girard's formula in order to get good accuracy in all cases. The big advantage of Girard's formula is that even when triangles contain nearly-antipodal points (e.g., A, B, -A + eps) the calculated area is consistent with the direction of the edge (A, -A + eps) as it is defined elsewhere in the library for the purpose of point-in-polygon tests, distance measurement, etc. This is true even when points are exactly antipodal (in which case the direction of the edge is defined using symbolic perturbations).

Deciding when to use one formula vs. the other requires rigorous error analysis. As you say, this should be easier for Eriksson's formula. In fact I have already done this analysis for both l'Huilier's and Girard's formulas a few years ago but have never gotten around to implementing the results. Essentially the algorithm computes conservative, reasonably tight error bounds for both area formulas and then uses the formula with the smaller error bound. (The current code reflects an early attempt at this---it is far from rigorous, but the cases it gets wrong are sufficiently rare that no one seems to be complaining.) Other techniques could also be used to compute the error bounds, such as interval arithmetic, but my experience is that analyzing the error in advance leads not only to faster code but also often insights into how to increase numerical stability.

Anyway, thanks for the suggestion. I will certainly take a closer look at this before making any further changes such as implementing the error bounds I mentioned above.

jrus commented 3 years ago

Trying to define areas for a triangle with approximately antipodal points (to near the limits of floating point precision) is almost arbitrary, as such a triangle degenerates to a lune with one nearly unspecified side, and tiny perturbations can get you anything from nothing up to a full hemisphere. People should really try to steer clear of triangles that get anywhere close to that. But if you want to get fancy you could do the arithmetic in this formula (up until the arctan) in exact arithmetic along the lines of http://www.cs.cmu.edu/~quake/robust.html. I’m not sure how the expense compares to the trigonometric version – it might even still come out cheaper.

But sure, if you have a prevailing convention for how to treat those triangles, stick to that. :-)

donhatch commented 2 years ago

Trying to define areas for a triangle with approximately antipodal points (to near the limits of floating point precision) is almost arbitrary, as such a triangle degenerates to a [lune] (https://en.wikipedia.org/wiki/Spherical_lune) with one nearly unspecified side, and tiny perturbations can get you anything from nothing up to a full hemisphere. People should really try to steer clear of triangles that get anywhere close to that. But if you want to get fancy you could do the arithmetic in this formula (up until the arctan) in exact arithmetic

When I first read this, I thought it was sensible advice, but, after a bit more thought, it seems to me that such triangles, and their computed areas, can be useful, even in the case of two antipodal or almost-antipodal points, even without resorting to exact arithmetic. The case I'm thinking of is when computing the area of a spherical polygon, by naively triangulating it and then adding up the (signed) areas of the spherical triangles. It may be that an internal edge of the triangulation (that is, an edge that is not an edge of the polygon, but is shared by two triangles of the triangulation) connects two antipodal or close-to-antipodal vertices. While it's true that the area of either of these two neighboring triangles is unstable (with respect to perturbations in the vertex positions), as long as the algorithm is careful to choose the exact same geodesic in both cases, the instability will cancel out and the final polygon area will be stable.

smcallis commented 2 years ago

We might consider a new method called ARPIST that was recently published too: https://arxiv.org/pdf/2201.00261.pdf

I think I've found a set of points that has huge relative error with the current code:

a0 = {-1.705424004316021258e-1, -8.242696197922716461e-1, 5.399026611737816062e-1}
b0 = {-1.705424004985861830e-1, -8.242696201211552332e-1, 5.399026606505162862e-1}
c0 = {-1.706078902524902630e-1, -8.246067103826443256e-1, 5.393669631850918078e-1}

GetArea returns 3.199981553556970766e-14 for those points, but I wrote it up in Mathematica so I could compute with arbitrary precision and with 100 digits of precision I get a real answer of 2.7487195801484825115e-21 for a relative error of a whopping 1.16e7.

The problem is that s = (sa+sb+sc)/2 == sa to 29 digits so s-sa loses all precision.

In[173]:= Angle[a2, b2]
Out[173]= 0.00063631365922543841765708066696380870407870313930649230103797588359

In[174]:= Sides[Angle[b2, c2], Angle[c2, a2], Angle[a2, b2]]
Out[174]= 0.00063631365922543841765708066708380489347145190593679637819439430796

I've attached a cleaned up copy of the notebook I was working in for reference. area_notebook.txt

(or maybe not ARPIST since we probably want to do a boundary integral as has been pointed out to me)

smcallis commented 2 years ago

The issue was actually the opposite here, we weren't cancelling enough digits. Using atan2(|AxB|, A*B) to get the edge lengths is ill behaved when the vectors are nearly parallel. Kahan provides a better formula here for two unit vectors A and B: 2*atan2(|A+B|,|A-B|) which computes the sides to almost full precision, letting s-sa fully cancel and fall back to the Girard formula (which snaps the area to 0).

jrus commented 2 years ago

Trying to define areas for a triangle with approximately antipodal points (to near the limits of floating point precision) is almost arbitrary, as such a triangle degenerates to a lune with one nearly unspecified side, and tiny perturbations can get you anything from nothing up to a full hemisphere. People should really try to steer clear of triangles that get anywhere close to that. But if you want to get fancy you could do the arithmetic in this formula (up until the arctan) in exact arithmetic

When I first read this, I thought it was sensible advice, but, after a bit more thought it seems to me that such triangles, and their computed areas, can be useful, even in the case of two antipodal or almost-antipodal points, even without resorting to exact arithmetic. The case I'm thinking of is when computing the area of a spherical polygon, by naively triangulating it and then adding up the (signed) areas of the spherical triangles.

I would guess coming up with a provably not-too-bad way to compute the areas of arbitrary spherical polygons to be very difficult if you want to support nearly antipodal points, though I haven’t thought too much about it. This “area of triangles in naive triangulation” method is certain to result in huge relative errors for the trickiest edge cases, even if every partial area computation is accurate to full double precision, unless (or maybe even despite) calculating the sum of partial areas exactly. If the triangulation can include 2 nearly antipodal points where you need to add/subtract values close to a full hemisphere area, then precision is sharply constrained: At the scale of the earth a hemisphere area is 2.55e14 square meters, which makes the least significant double precision bit on order 0.1 square meters. Then again, for most examples where your polygon triangulation involves some edges of length ~20e6 meters, maybe getting area accurate to 0.1 square meters is good enough.

jrus commented 2 years ago

I think I've found a set of points that has huge relative error with the current code:

a0 = {-1.705424004316021258e-1, -8.242696197922716461e-1, 5.399026611737816062e-1}
b0 = {-1.705424004985861830e-1, -8.242696201211552332e-1, 5.399026606505162862e-1}
c0 = {-1.706078902524902630e-1, -8.246067103826443256e-1, 5.393669631850918078e-1}

GetArea returns 3.199981553556970766e-14 for those points, but I wrote it up in Mathematica so I could compute with arbitrary precision and with 100 digits of precision I get a real answer of 2.7487195801484825115e-21 for a relative error of a whopping 1.16e7.

Something is a bit confusing here. In your attached text file the computed values are supposedly for a2, b2, c2, which is different than the a0, b0, c0 above:

a2 = -1.705424004316021258e-1, -8.242696197922716461e-1, +5.399026611737816062e-1;
b2 = -1.706078905422188652e-1, -8.246067119418969416e-1, +5.393669607095969987e-1;
c2 = -1.705800600596222294e-1, -8.244634596153025408e-1, +5.395947061167500891e-1;

But anyway, this is a nearly flat triangle with long side of ~ 4 km and area of something less than 0.1 square millimeters (at Earth scale). If you want to guarantee you can always get areas this small correct with low relative error (or even always the correct sign), you’re going to have to do some higher precision arithmetic.

But for what it’s worth, if we do E = 2 atan2(c₂ · ((½(a₂ + b₂) − c₂) × (b₂ − a₂)), (b₂ + c₂) · (a₂ + c₂)) using double precision arithmetic and those provided coordinates, the result is -2.7507004044118907e-21. (Results will be worse if you compute the points in a different order, because c₂ falls between a₂ and b₂.)


Edit: when I use these double precision numbers as input and 200-digit arithmetic, with the formula E = 2 atan(a · (b × c) / (1 + a·b + b·c + c·a)), the result I get is -2.7505340276401658e-21. So the result above was accurate to about 4.5 digits.

You have to be very careful with such skinny triangles though. If instead of using those binary numbers (e.g. starting with -3072218764138533/2^54) as input, I literally use the decimal numbers (e.g. -0.1705424004316021258), then with 200-digit arithmetic the result is -2.7487195801484826e-21, which is only accurate to 2 digits, despite being an extremely tiny perturbation.

And the provided inputs here have an unusual number of digits printed (19) for double-precision numbers; if I instead use the decimal number with the fewest digits which unambiguously evaluates to the same double-precision number (a perturbation of less than half an ulp), the result of the calculation is -1.32445639646312e-21, which is wrong in the first digit. If you pick the right example, a tiny perturbation will give you a sign flip.

smcallis commented 2 years ago

Sorry yes I provided the wrong points initially, it's the _x_2 points that were giving me trouble. It's definitely way at the boundaries of precision. We're actually not necessarily interested in the exactly correct area just "less wrong" in this case. The error was large enough that it flipped the area of an entire polygon that triangle was part of and gave an area ~= to the entire world. Giving up and saying "too small" and clamping to zero would have been better in that case.

It was extra grievous because the case we had was right on the boundary of the triangle being completely co-linear. The rounding error converting from ExactFloat to double in GetIntersection was just enough to flip the sign of that triangle, so we got different results on X86 (used the ExactFloat path) and ARM (used the quad double path).

You can see the situation here. I shifted and rotated the geometry down with GMP arbitrary precision floats: image (5)

0-3-X is the full triangle. The point X is the intersection point, you can see the rounding error pushes it across the edge, flips that triangle sign, which flips the polygon sign. Like I said, wayyyyy out in the weeds here.

That's excellent that Eriksson's formula seems to handle this case better though, I think we'll still want to integrate that change as well.

donhatch commented 2 years ago

Here's an easy example in which neither l'Huilier (side length based) nor Girard (spherical excess based) algorithms can possibly yield any significant digits:

Consider a spherical isosceles triangle with base 2e-4 and altitude 1e-13.

Eriksson's formula does fine.

jrus commented 2 years ago

Eriksson's formula seems to handle this case better though, I think we'll still want to integrate that change as well.

Well to be fair, note this is a trickier form of the Eriksson formula where we transformed
abc = a ∧ (½b + ½ca) ∧ (cb)
and then permuted the points so that a was between b and c; if we swap the order of the input points we still get a dramatic (relative) error:

E(a, b, c)   => -1.2545624995979123e-21 // about 1/2 of the correct result
E(b, c, a)   => -7.27767299913974e-22   // about 1/4 of the correct result
E(c, a, b)   => -2.7507004044118907e-21 // correct to 4 digits
exact result => -2.7505340276401658e-21

To guarantee a good result with the Eriksson formula (or whatever other formula) it’s necessary to either (a) check the magnitude of the numerator and denominator before taking the arctan and if one or the other is smaller than some threshold recompute in higher precision, or possibly (b) check the lengths of the triangle sides and permute the points so that long side is bc, and possibly also zero out results that are too small. And with nearly or exactly antipodal points the decisions are even trickier: should antipodal points return NaN? throw an error? pick some arbitrary triangle size between [–2π, 2π]?

On the upside both numerator and denominator in the Eriksson formula involve only multiplication and addition and are easy to reason about and amenable to (if desired) doing higher precision or even exact arithmetic, without inordinately much trouble and with the extra computational expense only necessary in rare cases.


Edit: When a, b, c are close to each-other, taking differences first is more likely to be exact.

It works better to instead implement this as either: abc = ½( a ∧ ((ba) + (ca)) ∧ (cb) )

E(a, b, c)   => -2.749454111371401e-21
E(b, c, a)   => -2.750359016940436e-21
E(c, a, b)   => -2.7507004044118907e-21
exact result => -2.7505340276401658e-21

Or simply as: abc = a ∧ (ba) ∧ (ca)

E(a, b, c)   => -2.7503591873344045e-21
E(b, c, a)   => -2.7499128624218675e-21
E(c, a, b)   => -2.7507004044118907e-21
exact result => -2.7505340276401658e-21

But in the case where b is almost antipodes to c, it would probably be better to do:
abc = ½( a ∧ (b + c) ∧ (cb) )

I’m not sure if there’s any convenient transformation which works equally well when any pair of points might be close or might be nearly antipodal.


One final consideration at extreme edge cases: often a · a ≠ 1, b · b ≠ 1, and/or c · c ≠ 1, because of the inherently limited precision in constructing points lying on a sphere using a 3-space representation. Then e.g.
1 + a · b + b · c + c · a = (a + b) · (a + c)
is not an exact equality, and in general formulas for area won’t be exactly right without clarification about what precise point on the sphere [x, y, z] is supposed to represent. Is it [x, y, z]/√(x²+y²+z²)?

In that case the exact value of E above should instead be -2.7505340276401654e-21.

smcallis commented 2 years ago

There's Euler's (other) formula as well which does no subtractions or tangents:

cos E = (cos α + cos β + cos c)/(4cos(α/2)cos(β/2)*cos(γ/2)) ref (page 23)

Since α, β, and γ are always positive there's never any cancellation in the numerator and for small sides the denominator approaches 4. The scale factors 4 and 2 are exact as well so this feels like it should be really well behaved, though I suppose it will have a singularity at antipodal points.

jrus commented 2 years ago

$$\cos E = \dfrac{1 + \cos α + \cos β + \cos γ}{4 \cos \tfrac12 α \ \cos \tfrac12β \ \cos \tfrac12γ}$$

The problem with this formula is that as vertices get closer together, each of the component parts cos(α) (i.e. dot product b · c between vertex unit-vectors), etc. just reduce to 1, so the whole formula loses precision as the triangle gets smaller, and at some point you just get cos E = 4/4.

smcallis commented 2 years ago

Yeah I suppose if we think of it as $\cos (x) = 1 - \dfrac{x^2}{2}$ for small x, then it's not very friendly at all precision wise.

donhatch commented 2 years ago

That could be fixed by replacing each cos that occurs in the formula with cos x = 1 - vers x = 1 - 2 sin^2(x/2) throughout. The 1's all cancel and the result is something that stays small and should be accurate when the vertices are close together, but it doesn't end up looking very tidy.

smcallis commented 2 years ago

May be OBE anyways if we move to your parallel transport method =D

jrus commented 2 years ago

One other rational formula that might be helpful is for the half-tangent of excess $\varepsilon = \tan\tfrac12E$ in terms of the half-tangents of the three sides $a$, $b$, $c$, (note the different meanings for letters than in previous comments in this thread)

$$ \varepsilon^2 = \frac {(-a + b + c + abc)(a - b + c + abc)(a + b - c + abc)(a + b + c - abc)} {(2 + a^2 + b^2 + c^2 - a^2b^2c^2)^2} $$

If starting from unit vectors $v_a$, $v_b$ for two vertices, then

$$ c = \left|\frac{v_a \wedge v_b}{1 + v_a \cdot v_b} \right| = \left|\frac{v_a - v_b}{v_a + v_b}\right| $$

and likewise for $a$ and $b$.

This is probably the nearest spherical analog of the planar Heron's formula, and if the arithmetic is done carefully (cf. Kahan “Miscalculating Area and Angles of a Needle-like Triangle”) should also be better behaved than l’Huilier’s formula.

It might be more effective to rewrite the numerator polynomial in terms of $a^2$, $b^2$, $c^2$,

$$ -a^4 b^4 c^4 + 2 a^4 b^2 c^2 + 2 a^2 b^4 c^2 + 2 a^2 b^2 c^4 + 8 a^2 b^2 c^2 - a^4 - b^4 - c^4 + 2 a^2 b^2 + 2 a^2 c^2 + 2 b^2 c^2 $$

and not bother taking square roots. I’m not sure what the best way is to factor this expression for numerical robustness.