mzechmeister / GLS

Generalised Lomb-Scargle periodogram
MIT License
33 stars 5 forks source link

NaN response from prob with norm lnL #8

Open sczesla opened 5 years ago

sczesla commented 5 years ago

Hi,

I get a NaN response from prob here. Is the calculation in line 652/3 correct?

Stefan

import gls as ptpp

N = 100
time = np.random.uniform(54000., 56000., N)
flux = 0.15 * np.sin(2. * np.pi * time / 10.)
# Add some noise
flux += 0.5 * np.random.normal(0, 1, time.size)
error = 0.5 * np.ones(time.size)

gls = ptpp.Gls((time, flux, error), verbose=True, norm="lnL")
mzechmeister commented 5 years ago

Indeed, I also can get nan, e.g.:

>>> gls.prob(-10)
nan

https://github.com/mzechmeister/GLS/blob/a1ee47b4779cb53bc998f99304644d83d36b56ba/python/gls.py#L651-L654

A nan can arise when you pass a positive value for lnL (here Pn) or not negative enough. This can lead to a negative value for 1-p in the last line and rising it to some power causes the nan. But lnL must be negative and indeed all values of gls.power should be negative.

I wanted to show that the same happens with with norm ZK and powers larger than one. Indeed, it does, but to my surprise now an error message occurs:

>>> gls = ptpp.Gls((time, flux, error))
>>> gls.prob(1.2)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "gls.py", line 654, in prob
    return (1-p) ** ((self.N-3)/2)
ValueError: negative number cannot be raised to a fractional power

It turns out that for lnL the values are converted into np.float64 and the error can be surpressed also for ZKin this way:

>>> gls.prob(np.float64(1.2))
nan

A wrong input could be catched, but the code would be blown up with a of lot error handling for all the norms and methods (prob, probInv, etc.).

sczesla commented 5 years ago

OK.

What confused me was that it happened with quite legitimate periodograms. I used this kind of code.

maxPower = gls.pmax
print("maxPower: ", maxPower)
print("stats: ", gls.stats(maxPower))

Problem is pmax, which is derived from self.p in _peakPeriodogram (not subject to later renormalization). self.power is the renormalized quantity. Also with pmax = max(gls.power), I get a weird result along the lines of

stats: {'Pn': -103.12840147567547, 'Prob': 7246890160.376161, 'FAP': nan}

mzechmeister commented 5 years ago

I cannot confirm this. I get the same result

>>> gls = ptpp.Gls((time, flux, error))
>>> gls.stats(gls.pmax)
{'FAP': 0.2661578990397514, 'Pn': 0.09955558972770197, 'Prob': 0.006182440735281105}
>>> gls = ptpp.Gls((time, flux, error), norm="lnL")
>>> gls.stats(max(gls.power))
{'FAP': 0.2661578990397514, 'Pn': -71.34146478841117, 'Prob': 0.006182440735281105}

But I agree that the naming is confusing and the handling is not safe. A suggestion is to rename p to _p (to indicate that it is an internal array) and power to p (P should be associated with for period). I think this is more intuitive. The code will be not fully backwards compatible.