High-multipole / elevation underflows?

marius311 commented 3 years ago

I'm finding that if I'm at high multipoles and high elevation, at some point the legndre polynomials seem to maybe underflow, so it becomes all zeros?

using Legendre, PyPlot
θ = deg2rad(160)
matshow(log.(abs.(λlm(0:3000, 0:3000, cos(θ)))))

Am I doing anything wrong here? If not, is there any way to get this to work out there?

marius311 commented 3 years ago

Heres a workaround fwiw

using Quadmath
matshow(Float64.(log.(abs.(λlm(0:3000, 0:3000, Float128(cos(θ)))))))

although as a total non-expert I wonder if theres a different way to rewrite things in this packages that prevents underflows even with Float64 (which is alot faster than Quadmath)?

jmert commented 3 years ago

Yes, this is a known issue that I haven't gotten around to sorting out — it's because the values for ℓ==m for large m and θ near the poles are very small (and hardly grow for a good number of ℓ), and the exponent underflows the Float64 range. The library underlying Healpix's first spherical harmonic transform (libpsht) makes mention of carrying around an extra exponent factor manually to extend the dynamic range (see paragraph 2 of §3.1.1 in https://arxiv.org/pdf/1010.2084.pdf), but that's not something I've tried replicating. Given Julia's type system, I'd considered that outside the scope of this library since the extended exponent could be another number type, kind of like DoubleFloats except the exponent has extra bits rather than the mantissa.

marius311 commented 3 years ago

Thanks for that reference, that rings a bell. I tend to agree with

Given Julia's type system, I'd considered that outside the scope of this library since the extended exponent could be another number type, kind of like DoubleFloats except the exponent has extra bits rather than the mantissa.

although to make sure I'm following, do you think extending the exponent only, not the mantissa, you could beat the performance of Float128s? Currenly its really bad:

julia> @time λlm(0:3000, 0:3000, Float64(cos(θ)));
  0.107317 seconds (8 allocations: 68.711 MiB)

julia> @time λlm(0:3000, 0:3000, Float128(cos(θ)));
  9.012393 seconds (9 allocations: 137.421 MiB, 0.28% gc time)```

marius311 commented 3 years ago

Oh, I should actually try DoubleFloats, Quadmath isn't the defacto standard I thought it was.

jmert commented 3 years ago

If I'm not mistaken, Quadmath dispatches to an external C library so you have the problem of an opaque data structure/number type to Julia that it can't optimize through. So I think that's one source of potential slow-down. The second is that yes, I think doing full Float128 calculations which have extended mantissas and exponents will probably add up to more operations to normalize and such than something which can do a mix of native on-CPU math when numbers have exponents in the range of Float64 and specialized bit-twiddling when doing the extended precision operations. (But again that's just a guess without actually seeing any concrete evidence.)

jmert commented 3 years ago

Oh, I should actually try DoubleFloats, Quadmath isn't the defacto standard I thought it was.

If I remember correctly, I've tried DoubleFloats and it doesn't help because you only gain a small number of orders of magnitude by extending the subnormal range (due to the large number of bits in the mantissa); it still has the same 11 bits of exponent as Float64. I think this algorithm really needs an "ExtendedFloats" package which extends the exponent range dramatically.

jmert commented 3 years ago

All of that being said, an alternative proposal does exist in the more recent paper that libsharp2's recurrence is based on (https://www.jstage.jst.go.jp/article/jmsj/96/2/96_2018-019/_article) which I haven't had time to fully digest. See §4. The increasing ℓ recurrence (what I call the 2-term recurrence in the documentation here) is very different from what I've implemented, so it's not immediately obvious how to make use of what's presented (especially because a goal here has been to support any arbitrary normalization via its recurrence coefficients).

marius311 commented 3 years ago

Yea, here's the full timing comparison, but I confirm what you remember, DoubleFloats does not give you enough precision to improve upon the plot in my OP:

julia> @time λlm(0:3000, 0:3000, Float64(cos(θ)));
  0.107317 seconds (8 allocations: 68.711 MiB)

julia> @time λlm(0:3000, 0:3000, Float128(cos(θ)));
  9.012393 seconds (9 allocations: 137.421 MiB, 0.28% gc time)

julia> @time λlm(0:3000, 0:3000, Double64(cos(θ)));
  2.675618 seconds (13 allocations: 137.421 MiB, 1.43% gc time)

milankl commented 2 years ago

Is this problem related?

julia> AssociatedLegendrePolynomials.λlm(600,500,1/sqrt(2))
3.485220526612233e-21

julia> AssociatedLegendrePolynomials.λlm(600,500,Float32(1/sqrt(2)))
6.120498f9

I'v realised this as for higher resolution in SpeedyWeather.jl simulations suddenly blow up when using Float32, which I traced back to re or precomputing the Legendre polynomials in Float32/64.

milankl commented 2 years ago

Okay, I don't think the issue I posted is exactly the same as it's not about underflow but overflow. I can recreate @marius311 issue and similarly now created

for x=1/sqrt(2). Around l=lmax, m=l*0.7 the Legendre polynomials seems to blow up (>1e100) with too little precision (for larger lmax it actually overflows). So for this overflow-like problem, I've tried to understand when a higher precision becomes necessary.

it seems that Float64 is fine to be used up to T3000, Float32 up to T400 and Float16 up to T31 (T begin largest degree l_max). I agree somewhat with your statement that

I'd considered that outside the scope of this library since the extended exponent could be another number type

if it was just about underflow because the actual Legendre polynomials are too small to be represented (and hence 0 is returned) but I'm not sure I agree that the same statement applies to this overflow problem, as for example above l=4000,m=3000,x=1/sqrt(2) should be

julia> λlm(4000,3000,Float128(1/sqrt(2)))
3.59135915094823849558444420818469611e-26

and therefore representable with Float64/32, but using Float64 it's

julia> λlm(4000,3000,Float64(1/sqrt(2)))
2.816428923268404e102

Maybe you have an idea how it's created and how to solve this?

milankl commented 2 years ago

Also note that @marius311 picture from above with Float64 should look like

julia> matshow(log10.(abs.(Float64.(λlm(0:3000, 0:3000, Float128(cos(θ)))))))

(via rounding after calculation in Float128). Meaning it just becomes 0 where numbers are not representable anymore:

I'd appreciate any ideas on that! Many thanks 😃

jmert commented 2 years ago

Is this problem related?
julia> AssociatedLegendrePolynomials.λlm(600,500,1/sqrt(2))
3.485220526612233e-21

julia> AssociatedLegendrePolynomials.λlm(600,500,Float32(1/sqrt(2)))
6.120498f9
I'v realised this as for higher resolution in SpeedyWeather.jl simulations suddenly blow up when using Float32, which I traced back to re or precomputing the Legendre polynomials in Float32/64.

Yes, it's fundamentally a problem related to Float32 having insufficient exponent range to properly calculate the full range of associated Legendre polynomial values along the calculation path. The traversal along the $m = \ell$ diagonals "underflows" in that the true answer is smaller than can be represented, but rather than actually truncating to zero, it instead oscillates between the two smallest non-subnormal values. When the recursion then takes off for $\ell > m$, the "initial value" is many orders too large, so the entire sequence is scaled incorrectly.

julia> λlm(500, 500, Float32(1 / sqrt(2))) == nextfloat(0f0)
true

julia> Λ64 = λlm(0:1000, 0:500, 1 / sqrt(2));

julia> Λ32 = λlm(0:1000, 0:500, Float32(1 / sqrt(2)));

julia> Λ32[501, 501] / Λ64[501, 501]
1.7885525771109464e30

julia> Λ32[801, 501] / Λ64[801, 501]
1.7561145427817653e30

and to visualize the calculation path taken to get to $(\ell, m) = (800, 500)$, the following shows the plot of values for the diagonal up to $\ell = 500$ and then the column of constant $m$ for $500 < \ell \le 1000$ (split by the black line):

using PyPlot                                                                                                                                                                                                                                  
using AssociatedLegendrePolynomials                                                                                                                                                                                                           

Λ64 = λlm(0:1000, 0:500, 1 / sqrt(2))                                                                                                                                                                                                         
Λ32 = λlm(0:1000, 0:500, Float32(1 / sqrt(2)))                                                                                                                                                                                                

x64 = [Λ64[ii, ii] for ii in 1:501]                                                                                                                                                                                                           
append!(x64, [Λ64[ii, 501] for ii in 502:1001])                                                                                                                                                                                               
x32 = [Λ32[ii, ii] for ii in 1:501]                                                                                                                                                                                                           
append!(x32, [Λ32[ii, 501] for ii in 502:1001])                                                                                                                                                                                               

clf()                                                                                                                                                                                                                                         
plot(x32, label = "Float32")                                                                                                                                                                                                                  
plot(x64, label = "Float64")                                                                                                                                                                                                                  
legend()                                                                                                                                                                                                                                      
axvline(501.5, color = "k")                                                                                                                                                                                                                   
ylim(-1e32, 1e32)                                                                                                                                                                                                                             
yscale("symlog", linthresh = 1e-80)                                                                                                                                                                                                           
title(L"$λ_\ell^m(1/\sqrt{2})$ on $(\ell,\ell)$ for $0\leq\ell\leq500$ and $(\ell,500)$ for $500<\ell\leq800$")                                                                                                                               
xlabel(L"multipole $\ell$")

legendre_semiunderflow

milankl commented 2 years ago

Thanks Justin for the explanations. For the time being, we'll use Float64 for precalculating the polynomials. If you ever find ways to reduce under, overflows with Float32 please let me know.

Maybe two naive questions, probably asking for a friend:

is there a normalization one could use that circumvents that problem?
Given that the traversal along the diagonal is a multiplication only, could this be done in log-space? Just thought that one could hence reach log($P\ell^\ell$), based on the size of that one could determine a multiplicative scale $s$ such that $sP\ell^m$ and $sP{\ell-1}^m$ are on the order of floatmin(Float32) for the 2-term recursion. (Just writing $P$ here as I don't know how that would interfere with the normalization). Once the desired $sP\ell^m$ is reached, the scaling is undone to obtain $P\ell^m$. So if log($P\ell^\ell$) reaches -70 then $s$ would be on the order of $10^{33}$ so that $sP_{\ell-1}^m = O(10^{-38})$ so somewhere close to floatmin. Using Float32 for $s$ would hence about double the dynamic range available. I haven't thought that through, but I thought it's an idea worth communicating

jmert / AssociatedLegendrePolynomials.jl

High-multipole / elevation underflows? #27