Scope of Jaynes analysis/Bayesian formalism

Your reference to Jaynes (1987) implies that he discussed the Lomb-Scargle periodogram. I don't think he knew about this particular algorithm, but was remarking more generally on least-squares fits to sinusoids.

Also, I wonder if you considered and rejected a fully Bayesian formalism for the problem you address in this paper? In http://arxiv.org/abs/math/0111127 I tried to demonstrate a generic link between frequentist statistics such as the periodogram and the Bayesian posterior for a corresponding quantity (in the spirit of Larry Bretthorst's power spectrum analysis: Bretthorst, G. Larry, 1988, Bayesian Spectrum Analysis and Parameter Estimation, Lecture Notes in Statistics, Springer-Verlag, No. 48; http://bayes.wustl.edu/). I don't know how universal it is, but a connection of the form

                       P( a ) ~ exp[ - C(a) / error variance ]

seems to pop up a lot. C(a) can be an auto- or cross-correlation function; or a power spectrum or a cross-power spectrum, corresponding to a being a time lag or the frequency of a sinusoidal component. Then the expedient of simply multiplying posteriors (for independent quantities) might be useful in the multi band context. If nothing else this might lead to a more rigorous "Occam factor" regularization.

Thanks @jscargle – yes, the sentence about the Jaynes paper is a bit misleading. We'll correct that.

I'd thought briefly about the Bayesian point-of-view for the multiband method, but I avoided discussing it for a couple reasons:

As your paper makes clear, the Bayesian and frequentist results for periodograms are usually related by a simple monotonic function, so going from one to the other is easy.
I think a fully-Bayesian treatment of the periodogram would involve marginalization over nuisance parameters in the model, and in some cases you want the nuisance parameter to be the period! Unfortunately, since the posterior is so highly multi-modal with varying period, I don't know of any Bayesian approach which can properly handle the problem in higher than a couple dimensions (perhaps nested sampling, but that's still a long-shot). I thought that rather than doing a half-Bayesian job which amounts to not much more than proposing some improper priors that lead to the exponent of the frequentist result, I'd stick to classical statistics here.

That said, a Bayesian approach to the multiband sinusoid-fitting problem would likely end up with a couple features:

Priors would probably take the form of some constraint on the deviation of amplitudes and/or phases within each band. Mathematically, the result would be equivalent to a nonlinear regularization of the model, and likely be similar in spirit to the ad hoc nonlinear regularization proposed in Long (2014), which we discuss briefly in the appendix. The main disadvantage is that the nonlinear optimization at each frequency is much more computationally complex (though Long points to tricks that can be used to overcome this)
Alternatively, you could construct a prior which would be precisely equivalent to our linear regularization, namely a wide Gaussian prior centered on zero for the band terms, but then you have an ad-hoc prior over parameters in a statistical rather than physical model, which to me is not much more satisfying than the statistically motivated (but admittedly still ad hoc) regularization we use.

For a truly Bayesian approach to period finding, I think a CARMA or similar model is a much better avenue (see e.g. Kelly et al 2014)

Good comments.

In the restricted context of a single sinusoid don't you think Larry Bretthorst's treatment is pretty compelling? Sure, you have to choose what parameters you take as "nuisance" and there are many possibilities, but Larry treats the most useful case(s) IMHO.

Yes, in this context Bayes and frequentist are related one-to-one in a monotonic fashion. But, if only cosmetically, the exponentiation can emphasize the "correct" mode at the expense of other (smaller) local maxima. More important if you have a good handle on the observational errors the Bayesian expression gives the full posterior probability (as opposed to a single global maximum).

You make excellent points regarding uncertainty in priors making the marginalized distribution ad hoc in much the same way as regularizations. (This is especially clear for regularization via the approximate BIC -- Bayesian Information Criterion. I wonder what the count of regularization schemes is? AIC, BIC, MDL, … ) Alas, well-justified priors are rare; but when then are available the so-called Occam factor in the marginal posterior avoids ad hocery.

… on a different topic. In the discussion of Fig. 3 you state: "Second, notice that as the number of terms is increased, the general \background" level of the periodogram increases. This is due to the fact that the periodogram power is inversely related to the chi-squared at each frequency. A more flexible higher-order model can better fit the data at all periods, not just the true period. Thus in general the observed power of a higher-order Fourier model will be everywhere higher than the power of a lower-order Fourier model." This seems counterintuitive to me. Adding harmonic components to the model in the manner of eq. (14) makes the frequency, omega, represent both omega itself and its harmonics n omega, n > 1. You can see this in the right-hand panels of Fig.3: as n goes from 1 to 2 to 3, the peak power at the true fundamental Po increases -- power from the harmonics is incorporated via eq. (14) into the fundamental. How can one interpret the rise (from total insignificance) of the power at the first harmonic, 2 Po? And why doesn't a better-fitting higher order model move power from the background continuum into the harmonics? -- the reverse of what you state. I am sure you are correct, but I guess I don't understand your "inversely related to … " argument.

Jeff – thanks for all the comments! I really appreciate your close read of the paper and taking the time to discuss.

A couple responses:

I think you're right that Bretthorst's treatment is compelling in terms of fitting a single sinusoid to data. But the problem is that P(omega|data, M) with M=(a single sinusoid fits our data) is not exactly what we're after. What we're after is P(omega|data, M2) where M2=(our data has a period omega), and we need to be very careful about assuming one maps to the other! For example, in typical ground-based data, the strong Fourier power in the survey window function causes all sorts of issues, to the point where I don't in practice trust the top peak in the single-sinusoid fit. So I don't trust the Bayesian posterior here to reflect the information I'm interested in, and its propensity to diminish secondary peaks seems a shortcoming rather than an advantage.

In other words, while the Bayesian result might give you the correct posterior given your assumption that a single sinusoid fits your data, that assumption is indisputably wrong! Nevertheless, a wrong model can still be a useful model. I chose in the paper to focus on the chi^2 because I think it's a less complicated discussion: in particular, we found that the true period is often among the top 5 peaks. This is easy to justify in the chi^2 view, but I think much more difficult to justify in the Bayesian picture where those peaks (given our flawed model) are suppressed by orders of magnitude!

Regarding the multi-term fits: I was thinking of them in terms of nested linear models with N parameters. At each frequency, as N increases, the chi^2 decreases. The periodogram is P = 1 - K chi^2 for some constant K, so increasing N increases the power. And this is true regardless of frequency, so the periodogram of a higher-order model is everywhere greater than the periodogram for a lower-order model.

If there's any way I can make that more clear in the discussion, I'd be open to suggestions.

jakevdp / multiband_LS

Scope of Jaynes analysis/Bayesian formalism #1