Open herjy opened 4 years ago
It hints again at an issue with band-limitedness. In particular, i looks like the edges create some issues.
Note, I get the same results with gaussians generated fron scarlet.GaussianPSF
Good example. This is the same problem as in #215. As we have said there, it is unlikely that errors in the diff kernel itself have enough power to do that. With this new test, it's indeed more likely that undersampling is responsible.
Here's what I see. Both positivity and monotonicity become important in the outskirts and can change pixel values there even if the gradients are bandlimited (which they are). Thus, their artifacts look similar. But it seems that both have a step within the significant region of the source. I have no idea why because the proxes don't do anythere there.
Normalization and center on don't do anything for that bright a source (and in the former case don't affect the band limit). So, their effects should be much weaker, if any. What we see are minor checkerboard patterns at the edges of the box, plus some ringing towards the center. These could come from backpropagating noise. Whenever you have some power at the edge, Gibbs will kick in and create artifacts.
So, in both cases I think the proxes cause the solutions to move past the band limit set by the model kernel. Can you check the spectrum of the solutions and the model kernel?
I would try two separate things, with the same set up as above:
Also, what was the initial guess for this model?
Sorry, I missed your posts. Already tried the boxes, nothing changed. Convolutions are already in real space. spectrum and initial guess look fine (no ringing in the morph).
Another idea. Can you initialize both spectrum and morphology at the truth (i.e. the shape convolved with the model PSF). Then let the optimizer run. It should be converged within a few steps, but if some error is in the code, or noise is too important, it can walk away from the true minimum.
Initialisations at the truth don't converge, which is super weird:
Though I checked that the sources projected in the the image plane fit perfectly the data (i.e. the forward path is clear). Actually the final fit to the data is decent and it shows axactly the same power spectrum as the true source. However, it has tiny sharp features that produce the rings even if there are no constraints: (With constraints the sharp features are here but organised as a ring)
(This [up] is modeled source - true source and below are the residuals )
This actually makes no sense with regard to the optimisation problem. Indeed, the signal we reconstruct is supposed to be convolved by the model psf and should be band-limited. However, we have nothing that guarantees it.
These tests are performed on gaussian profiles generated with scarlet, not galsim. I just realise that I forgot to put inverse variance weighting. Now that it is on, the results are better but still show some structures that come from these strange edges:
Typically, this happens to the power spectrum, when it shouldn't:
Not so fast. I think this is to be expected. The optimizer will make a step in the direction of the gradient, which is non-zero because the best-fit solution to the noisy image isn't the true solution. The amplitude of these steps are set by us: for ExtendedSource
each pixel can vary by as much as 10% of the average pixel values. So, yes, the solver will move to some location by making a step away from the truth, and then attempt to recover the best solution, which looking at the loss curve seems like it works.
However, it cannot simply get rid of pixel power because we don't like it. It's an almost degenerate solution, which you can see when you compare the rendered model with the observation. The fluctuations in the model of the last example are correlated, but they also seem to have sharp edges. Did you use positivity here?
Also, the comparison of the power needs to be made wrt the model PSF, not the true source. Since our model PSFs are very compact, I'm not sure how much excess power there really is in our model.
The important thing for me is that there are no large scale artifacts in this example. The residuals are correlated but don't ring. This to me suggests that our forward path and the gradients are clean, and that the optimizer works as expected when started very close to the optimum.
So, what remains to be determined is:
Given the very first example from above which used only CenterOn
(irrelevant here),
I am guessing that it's a combination of 1 and 2. When started at some half-good guess, the gradients (band-limited) will move that solution, and the proxes will chop off what they don't like (not band-limited). Now even the forward path isn't clean anymore, which will lead to artifacts if we use FFTs. But even without, it's possible that the optimizer is now on a trajectory that amplifies some combination of noise pattern and penalty function, to the detriment of the gradient updates. If this is the failure mode, we could apply the proxes on every second iteration to break the cycle.
In the last example: No constraint -> No positivity In the power spectrum, the true source is convolved by the model psf.
Looking at my last example, the ringing depends on the resolution. I doubled the size of the psf (aka went from Euclid to Rubin) and I saw more rings than I did when I watched Julia Roberts' entire filmography.
I don't see how the optmizer can converge to these solutions, the 2-norm isn't minimised and it rings coherently, which seems to only make things worse, not better.
Could you add the power spectrum of the model PSF? I believe you right that the model has power where it shouldn't, but comparing it to the true galaxy (even convolved) doesn't confirm that. We expect small scale power from the noise, the question is do we get some contribution that isn't band limited by the model PSF.
You doubled the size of which PSF: observed or model?
Here's the full plot:
I increased the size of the observation psf
So, we believe now that the rings are not artifacts (of some undersampling effect), but local minima of the objective function, which might become traps if the initial guess is quite far off, and the step sizes are too small. This statement is based on tests with relatively large errors on overall amplitude of the SED (but correct morphology), where the rings don't vanish even after 1000 iterations. It also makes sense that this would occur for bright galaxies where the local minima would be more pronounced and not wiped out by noise.
The practical solution to this problem is better initialization (coming soon), but that's not always possible. In the absence of that, we could allow for larger steps if needed (this is the topic of #211); adopt a variable step size schedule (which is common thing in neural net training but not normally used with Adam); or use other trickery to optimize more robustly.
I prefer to fix this problem with better initialization and maybe #211.
Removing the psf "corrections" makes things a lot better. Removing these lines (1) & (2) makes things a lot better, but it is still not perfect. The initialisations are now wrong by a few points (absolute, not relative, I'm assuming this depends on the psf norm) but lower than the final value (instead of higher) which leads to slightly better results but ringing is still there. This is concerning because initialisation in crowded areas, even if improved, might not give a good enough result and lead to artefacts.
On the bright-ish side (and this is a very optimistic way of phrasing things), the effect is much worse on LRO, despite the initialisation yielding the same values as for Observations
, which might explain why the performance of joint processing seemed worse in the bright end on previous tests.
The first line you linked needs to stay in place because otherwise the return value isn't the spectrum of a point source. The second line is indeed problematic for very extended sources. I'm thinking about a way to make these work.
The first one is just as bad for extended sources. It's taking a point measurement where there is an integration. Typically for a test where psfs and sources are gaussians with max at 1 for the source and sum at 1 for the psf, the first line takes the the spectrum from 1 to 46...
Sure, no question about that, but read the docstring. The return value of this method is the SED of a point source! This needs to be stated (and obeyed) because otherwise the spectrum is completely ill-defined when the PSF has width variations across bands.
I have an idea for this problem. In weak-lensing land we use moment-based estimates like the galaxy resolution R = 1 - T_psf / T_gal as an indicator how close a source is to the PSF (width). T is the trace of the second moment matrix, and T_gal is measured from the observed=convolved image.
get_pixel_spectrum
has two options: one that doesn't correct for the observed PSF, and one that does, assuming that the source is a point source. For very extended galaxies this correction is not necessary because the peak heigh, which we see in the peak pixel, is unaffected by the PSF (think: convolution of a constant gives the same constant). So, we should interpolated between these two regimes as a function of R.
If we allow ourselves to assume a Gaussian shape for the convolved source, then the computation of the peak height is analytic for the moments of the PSF and the moments of the source (either convolved or preconvolved). That's the only option that avoids a forward convolution for every source and every observation. This would give us the correct factor to interpret the spectrum of the peak pixel even for extended galaxies.
The latest changes to initialisation did not improve on this. I still observe artefacts for sources below 20 mags.
Are you running set_spectra_to_match
?
Yup! Actually, going down to mag 19 seems ok. But below that artefacts ring a lot, especially for well resolved sources. The cutoff is not as sharp as it used to be from the experiments I've made, but the problem is still there. I'll run my sims and see what happens, but I am affraid that the bright end will still be affected.
I ran some tests late last week and I think that undersampling may still be a part of this. The discussion is a bit lengthy, so we can discuss in our meeting tomorrow and I'll update this issue after that (if necessary).
Then it isn't the mechanism we identified. Remy's report indicated that the mismatch in the spectrum amplitude is driving the ringing. This mismatch is now absent. We also concluded that FFT artifacts (e.g. from undersampling the models) are orders of magnitude too small to explain our findings. Let's talk about this tomorrow.
And for some reason, this is worse for multi-resolution. This is sort of understandable if the issue comes from an issue of likelihood exploration but still, YUCK! The shape of the feature is very strange and it is concerning that the optimizer does not pick up on that (nvm the colour): Multi-resolution:
Single resolution:
What's the difference between the top and bottom row in each case?
And how is the second set not also multi-resolution? There are clearly different numbers of pixels involved?
Top 2 rows are multi-resolution. Bottom rows are two independent single resolution runs. (Euclid and Rubin)
This is a different version of the earlier problem: massive mismatch of the initial model. In this case it happened since the morphology of the model in multi-resolution is determined from the high-res channel, where source 2 is essentially absent.
But the mechanism that prevents the low-resolution to fit is interesting: the subgradients from the proxes null the gradient update. The residuals make for a gradient that wants to decrease the model in the center and increase it in a ring around the center. Because the model is flat, this leads to a center that is lower than its perimeter, which monotonicity will undo. The result is a flat pancake whose mean amplitude is sort of correct to fit the model.
I believe this is the true origin of the problems we keep seeing. Suggest to remedy this by switching the proxes off occassionally. We have experimented with random perturbations of the gradients, or with omissions of grad updates, all meant to prevent spurious features and cyclic update dependencies. I suggest that we modify Blend.fit
to bypass the proximal mapping in 10% of the time, or so.
What we need to determine is:
I will take a stab at this
I recently noticed a significant ringing in the residuals of fits to very bright galaxies simulated with galsim. I intially thought that mono was not the culprit, but it seems that the proxes are making things a lot worse. Switching ALL constraints off makes things a lot better, but I'm not convinced that there isn't something wrong. Then each constraint, except for
CenterOnConstraint
does somnething awful. No constraint &CenterOnConstraint
:Positivity
only:Normalization("max")
only:monotonic
only: