[x] @rodluger Title: There are many ways to map stellar surfaces, of which this is only one. So I would go with "Mapping stellar surfaces with light curves I: Information and degeneracies"
The paper series is more general than that: I'm going to tackle Doppler imaging in paper 3 (or 4)!
[x] @rodluger Abstract: I would add to the abstract some conclusions. Right now it is very vague. Not sure what yet, it would take some analysis. However nothing is mentioned about what kinds of degeneracies exist or anything quantitative.
Added some concrete conclusions. Let me know what you think.
[x] @rodluger Abstract: I am not sure you are using the word "ill-posed" correctly. I think you mean "admits of many solutions". Also I don't think it is true that any inference is necessarily prior-dominated. For instance, there are inferences you can do that don't involve priors at all, and there are some things about the surface you can know well. I think you mean that there are many possible surface maps for any light curve, and that if you want a specific map or a pdf over maps you have to break these degeneracies with regularization or priors. I'm just concerned that the current language is a little sloppy.
_Good points. I do think ill-posed is correct, though, no? A well-posed problem is one that (among other things) has a unique solution. I can definitely change the language concerning "prior-dominated", and I can clarify what I mean by ill-posed._
[x] @rodluger Abstract: Ensemble methods only break some of these degeneracies, right? And I don't think they really do, I think they just provide better priors, if you are concerned with inferring a single star surface. So if the ensemble method is used to regularize or break the degeneracies for one star, then it doesn't break the degeneracy, it provides the prior. If the ensemble method is used to get mean maps or distributions over maps, then it is solving a different problem. Anyway, again I think the language here is a little imprecise / misleading. In my view the problem is always degenerate--admitting of many solutions--priors can be used to regularize that and make posterior pdfs compact. Ensemble methods provide good data-driven ways to create individual-star priors. But they don't change the fundamental point that there are degeneracies. And aren't there degeneracies even once you have chosen your prior? It depends on the meaning of the word "degeneracy". If you take a narrow meaning, then anything (even numerical precision or a regularization) breaks it. If you take a wide meaning, then priors don't even break it. Once again, language issues. These issues might thread through the paper too? Audit for degeneracy.
[x] @rodluger I think there is clarity in the paper when you define the flux operator. That could be mentioned in the abstract. The flux operator goes from surface to lightcurve and it has limited rank. That's clear and unambiguous and doesn't take on overloaded meanings.
[x] @rodluger Figure 2: Say in the caption what the coordinate system is. The Ylms are fixed in that coordinate system.
[x] @rodluger Figure 3 (and text): I don't like the expression "number of light curve signals". I think maybe you mean "number of independent degrees of freedom in the light curve" of something like that?
[x] @rodluger Equation (2): It is unconventional to use V.T as the third factor. I don't mind, but it's unconventional! Also in the following text: When you make S non-square, you have a big part of V that is degenerate, and subject to arbitrary rotations, so the decomposition USV is non-unique. If you make S square and V rectangular, the decomposition becomes unique and well defined. Without loss of anything. I think? Maybe you use the rest of V in your prior though... if so, preview that for your reader early.
_That's the convention used by Wikipedia and scipy (the matrices are all real, so V^h = V^T). As for the second point, yes, I do use the rest of V to define the null space operator, so I think I need to use the non-unique decomposition. I see now that the thing I define as V_o is not unique, but I think that its inner product with itself is unique, since that's what defines the null space operator. Let's chat more about this._
[x] @rodluger Equation (7): I don't love the horizontal and vertical bars inside the matrices. They aren't needed are they? You might want a "where" statement after (7) explaining that those are block matrices, made as 2x2 ensembles of other matrices. Also, it is easy to get lost on the operators, so it might be a good idea to occasionally remind the reader what the size is of the various things.
_They are needed for expressions like the one below. Without the bars, it is hard to compactly express that S_o is rectangular._
[x] @rodluger Figure 4: I have no idea how to show this, but the figures are made edge-on but the lightcurves are made at 60 deg. By the way, are you exceedingly clear in the text what you mean by inclination of 60 deg? So it wouldn't be confused with 30 deg? You could make a figure or an inset that shows the setup for the standard problem you are doing, showing a 3-d image of the sphere and the line of sight and the coordinate system on the sphere...?
I added an inset showing the geometry:
[x] @dfm Figure 5: I don't love the jump immediately to posterior shrinkage. I think this plot could be made with no reference to prior or posterior right? it is a statement about the likelihood function, and could be expressed as such? If so, then the problem becomes one of information theory, rather than one of Bayesian inference (or appeals to both communities). Also, this figure is prior-dependent, because if I make my prior very narrow, you won't get any shrinkage here. But there is something you can say that is not at all prior-dependent, which is about he likelihood function. Indeed, I think this IS a plot only based on the likelihood function so it is good to make that clear in its labeling and caption.
[x] @dfm Equation (16): As you know I think you should talk about the KLD here. You don't have to USE it but you should mention it if you are going to describe posterior shrinkage. But the thing you have there is close to the relevant information-theoretic quantity. Look up the Fisher information: It is the inverse variance, essentially. If you re-wrote this as a change in the inverse variance, then you can express your result directly in terms of information. And I think the statistic that you use is indeed a fisher information difference, divided by a fisher information, so maybe it can be written as a difference in information theory...? But probably only if you do it in a way that's prior-insensitive. Which I think you can if you look at how the posterior variance depends on the likelihood width. At the very least, the conversions between the Fisher information, the S statistic, and the KLD should be given in the text. I can help with that if you want these conversions (though not necessarily today!).
[x] @rodluger Somewhere: How much do you discuss rotation period? How much do things depend on this? I guess in the limit of infinite snr, and non-variable star, you always know the rotation period exactly. But it might be worth saying something about that? And what kinds of losses are expected from learning the rotation period from the same data that you are using for reconstruction? Just as a discussion point. Maybe the point that you are assuming that you know rotation periods should appear in the abstract? And also that you have long light curves? Not necessarily, of course.
[x] @rodluger Discussion: I am a big believer that discussion sections are the most important part of a paper (after the title and abstract) so I would recommend taking some time on that. In the discussion make sure you discuss the limitations very explicitly like:
[x] Assuming you know exactly the rotation period.
[x] Assuming you have very high SNR
[x] Assuming the star doesn't vary at all with time
[x] Assuming that you can find ensembles of stars with identical surface maps.
[x] @rodluger Indeed, I'm a believer in listing all assumptions in an early section, with names, and then coming back to them for discussion about how they work in the Real World (tm) in the discussion at the end. If you want feedback on all that, I'd be happy to give it, it is my favorite part of writing. I think a method and ideas are well explained when you can explain what they can't do just as clearly as you explain what they can do. So it's important.
@davidwhogg I'm addressing your comments below:
[x] @rodluger Title: There are many ways to map stellar surfaces, of which this is only one. So I would go with "Mapping stellar surfaces with light curves I: Information and degeneracies"
The paper series is more general than that: I'm going to tackle Doppler imaging in paper 3 (or 4)!
[x] @rodluger Abstract: I would add to the abstract some conclusions. Right now it is very vague. Not sure what yet, it would take some analysis. However nothing is mentioned about what kinds of degeneracies exist or anything quantitative.
Added some concrete conclusions. Let me know what you think.
[x] @rodluger Abstract: I am not sure you are using the word "ill-posed" correctly. I think you mean "admits of many solutions". Also I don't think it is true that any inference is necessarily prior-dominated. For instance, there are inferences you can do that don't involve priors at all, and there are some things about the surface you can know well. I think you mean that there are many possible surface maps for any light curve, and that if you want a specific map or a pdf over maps you have to break these degeneracies with regularization or priors. I'm just concerned that the current language is a little sloppy.
_Good points. I do think ill-posed is correct, though, no? A well-posed problem is one that (among other things) has a unique solution. I can definitely change the language concerning "prior-dominated", and I can clarify what I mean by ill-posed._
[x] @rodluger Abstract: Ensemble methods only break some of these degeneracies, right? And I don't think they really do, I think they just provide better priors, if you are concerned with inferring a single star surface. So if the ensemble method is used to regularize or break the degeneracies for one star, then it doesn't break the degeneracy, it provides the prior. If the ensemble method is used to get mean maps or distributions over maps, then it is solving a different problem. Anyway, again I think the language here is a little imprecise / misleading. In my view the problem is always degenerate--admitting of many solutions--priors can be used to regularize that and make posterior pdfs compact. Ensemble methods provide good data-driven ways to create individual-star priors. But they don't change the fundamental point that there are degeneracies. And aren't there degeneracies even once you have chosen your prior? It depends on the meaning of the word "degeneracy". If you take a narrow meaning, then anything (even numerical precision or a regularization) breaks it. If you take a wide meaning, then priors don't even break it. Once again, language issues. These issues might thread through the paper too? Audit for degeneracy.
[x] @rodluger I think there is clarity in the paper when you define the flux operator. That could be mentioned in the abstract. The flux operator goes from surface to lightcurve and it has limited rank. That's clear and unambiguous and doesn't take on overloaded meanings.
[x] @rodluger Figure 2: Say in the caption what the coordinate system is. The Ylms are fixed in that coordinate system.
[x] @rodluger Figure 3 (and text): I don't like the expression "number of light curve signals". I think maybe you mean "number of independent degrees of freedom in the light curve" of something like that?
[x] @rodluger Equation (2): It is unconventional to use V.T as the third factor. I don't mind, but it's unconventional! Also in the following text: When you make S non-square, you have a big part of V that is degenerate, and subject to arbitrary rotations, so the decomposition USV is non-unique. If you make S square and V rectangular, the decomposition becomes unique and well defined. Without loss of anything. I think? Maybe you use the rest of V in your prior though... if so, preview that for your reader early.
_That's the convention used by Wikipedia and scipy (the matrices are all real, so
V^h = V^T
). As for the second point, yes, I do use the rest ofV
to define the null space operator, so I think I need to use the non-unique decomposition. I see now that the thing I define asV_o
is not unique, but I think that its inner product with itself is unique, since that's what defines the null space operator. Let's chat more about this._[x] @rodluger Equation (7): I don't love the horizontal and vertical bars inside the matrices. They aren't needed are they? You might want a "where" statement after (7) explaining that those are block matrices, made as 2x2 ensembles of other matrices. Also, it is easy to get lost on the operators, so it might be a good idea to occasionally remind the reader what the size is of the various things.
_They are needed for expressions like the one below. Without the bars, it is hard to compactly express that
S_o
is rectangular._[x] @rodluger Figure 4: I have no idea how to show this, but the figures are made edge-on but the lightcurves are made at 60 deg. By the way, are you exceedingly clear in the text what you mean by inclination of 60 deg? So it wouldn't be confused with 30 deg? You could make a figure or an inset that shows the setup for the standard problem you are doing, showing a 3-d image of the sphere and the line of sight and the coordinate system on the sphere...?
I added an inset showing the geometry:
[x] @dfm Figure 5: I don't love the jump immediately to posterior shrinkage. I think this plot could be made with no reference to prior or posterior right? it is a statement about the likelihood function, and could be expressed as such? If so, then the problem becomes one of information theory, rather than one of Bayesian inference (or appeals to both communities). Also, this figure is prior-dependent, because if I make my prior very narrow, you won't get any shrinkage here. But there is something you can say that is not at all prior-dependent, which is about he likelihood function. Indeed, I think this IS a plot only based on the likelihood function so it is good to make that clear in its labeling and caption.
[x] @dfm Equation (16): As you know I think you should talk about the KLD here. You don't have to USE it but you should mention it if you are going to describe posterior shrinkage. But the thing you have there is close to the relevant information-theoretic quantity. Look up the Fisher information: It is the inverse variance, essentially. If you re-wrote this as a change in the inverse variance, then you can express your result directly in terms of information. And I think the statistic that you use is indeed a fisher information difference, divided by a fisher information, so maybe it can be written as a difference in information theory...? But probably only if you do it in a way that's prior-insensitive. Which I think you can if you look at how the posterior variance depends on the likelihood width. At the very least, the conversions between the Fisher information, the S statistic, and the KLD should be given in the text. I can help with that if you want these conversions (though not necessarily today!).
[x] @rodluger Somewhere: How much do you discuss rotation period? How much do things depend on this? I guess in the limit of infinite snr, and non-variable star, you always know the rotation period exactly. But it might be worth saying something about that? And what kinds of losses are expected from learning the rotation period from the same data that you are using for reconstruction? Just as a discussion point. Maybe the point that you are assuming that you know rotation periods should appear in the abstract? And also that you have long light curves? Not necessarily, of course.
[x] @rodluger Discussion: I am a big believer that discussion sections are the most important part of a paper (after the title and abstract) so I would recommend taking some time on that. In the discussion make sure you discuss the limitations very explicitly like:
[x] Assuming you know exactly the rotation period.
[x] Assuming you have very high SNR
[x] Assuming the star doesn't vary at all with time
[x] Assuming that you can find ensembles of stars with identical surface maps.
[x] @rodluger Indeed, I'm a believer in listing all assumptions in an early section, with names, and then coming back to them for discussion about how they work in the Real World (tm) in the discussion at the end. If you want feedback on all that, I'd be happy to give it, it is my favorite part of writing. I think a method and ideas are well explained when you can explain what they can't do just as clearly as you explain what they can do. So it's important.