Lite - Githubissues

fred3m commented 2 years ago

Creates an autogradless version of scarlet along with a choice of optimizer, new initialization schemes, and other utility functions for simple use cases when the models are simple and the gradients are known.

esheldon commented 2 years ago

Sounds cool! Are there going to be unit tests for "lite" to assess the performance compared to "heavy" scarlet?

fred3m commented 2 years ago

Yep, I ran it on all of the usual scarlet tests to compare the results. I can post some plots tomorrow.

esheldon commented 2 years ago

Sounds good. However, am I right that the unit tests currently do not cover lite?

fred3m commented 2 years ago

That's correct.

esheldon commented 2 years ago

Couple more errors in displaying

  File "/home/esheldon/git/scarlet/scarlet/display.py", line 241, in show_observation
    center_ = observation.get_pixel(center)
AttributeError: 'LiteObservation' object has no attribute 'get_pixel'

Also the LiteBlend is not iterable, so this example from the documentation no longer works

    for k, src in enumerate(blend):
        if hasattr(src, "center"):
            y, x = src.center
            ax[0].text(x, y, k, color="w")
            ax[1].text(x, y, k, color="w")
            ax[2].text(x, y, k, color="w")

esheldon commented 2 years ago

I noticed that "skipped" is no longer returned, are sources no longer skipped?

esheldon commented 2 years ago

Here is a comparison with and without scarlet.lite.weight_sources. It looks like the overall normalization is off

Note these images are produced by metacalibration, so there is significant correlated noise.

e_rel = 1.0e-2 No weight_sources tol2 With weight_sources tol2-reweight

Same but for tolerance e_rel=1.0e-4 No weight sources tol4 With weight_sources tol4-reweight

fred3m commented 2 years ago

I noticed that "skipped" is no longer returned, are sources no longer skipped?

Lite sources have an is_null property, so skipped can easily be obtained by [src for src in sources if src.is_null], so it didn't seem necessary to me to keep track of it like we did before. Note: the skipped return value was included because we didn't always have a scarlet.NullSource, so it was necessary to re-insert sources after deblending to keep source indices the same before and after running scarlet. We really could/should have removed it in scarlet main once NullSource was created.

esheldon commented 2 years ago

Should I be ignoring sources that are is_null ?

fred3m commented 2 years ago

Couple more errors in displaying

Thanks for pointing these out. As Peter said, this isn't meant as a drop in replacement and the more I think about it the more I feel like scarlet.lite.display should be used instead of trying to mangle the data types to work with display in main scarlet. So I'll probably re-introduce similar functions in scarlet.lite.display so that the easy solution will be to use from scarlet.lite import display, which should make code calling the display functions remain valid.

fred3m commented 2 years ago

@esheldon I'm curious about your residuals when weighting sources. Is this because of the correlated noise that is added? The weighted templates have zero residuals with the input (by design), so any bias in those "models" is due to improper normalization (zero point?) of the input data, right?

esheldon commented 2 years ago

Thanks for pointing these out. As Peter said, this isn't meant as a drop in replacement and the more I think about it the more I feel like scarlet.lite.display should be used instead of trying to mangle the data types to work with display in main scarlet. So I'll probably re-introduce similar functions in scarlet.lite.display so that the easy solution will be to use from scarlet.lite import display, which should make code calling the display functions remain valid.

I see, thanks. In that case, I suppose some of the compatibility changes could be backed out if you like.

esheldon commented 2 years ago

@esheldon I'm curious about your residuals when weighting sources. Is this because of the correlated noise that is added? The weighted templates have zero residuals with the input (by design), so any bias in those "models" is due to improper normalization (zero point?) of the input data, right?

the input data have correlated noise, but zero background. How would the correlated noise cause an overall change in normalization?

fred3m commented 2 years ago

Should I be ignoring sources that are is_null ?

In the sense that any source that is_null doesn't have any components (that's the definition of the is_null property) so it will return a zero model.

fred3m commented 2 years ago

@esheldon I'm curious about your residuals when weighting sources. Is this because of the correlated noise that is added? The weighted templates have zero residuals with the input (by design), so any bias in those "models" is due to improper normalization (zero point?) of the input data, right?

the input data have correlated noise, but zero background. How would the correlated noise cause an overall change in normalization?

I'm not sure, I'm just trying to understand where your "residual" value comes from. When I calculate residuals I get all zeros, since the input data and output model with all of the sources made by redistributing the flux are the same.

esheldon commented 2 years ago

Here is an example without correlated noise that still shows the big residuals

No reweighting, e_rel=1.0e-4 tol4

With reweighting, e_rel=1.0e-4 tol4-reweight

esheldon commented 2 years ago

It could be a bug in how I'm running the code. Here is what I'm doing for the measurement and visualization

https://gist.github.com/esheldon/2161c619f4a86b797408bada6222f320

esheldon commented 2 years ago

Notice how the model has some low frequency "noise" in it for the reweighted case but not the regular case.

esheldon commented 2 years ago

Views of the models with and without reweighting.

No reweighting

tol4-model

With reweighting

tol4-model-reweight

esheldon commented 2 years ago

Note it looks like the show_sources code is not passing on use_flux when show_rendered=True. You can see this in the "Model Source 1 Rendered" panel above which does not show the low frequency noise

line 84 model_ = src.get_model(bbox=bbox)

fred3m commented 2 years ago

Ohhhhhh, I see the problem. When you re-weight the images they are already convolved, so the extra convolution that you do (observation.render(model)) performs an extra convolution, hence the change in sign. You should only render the model in the observed frame if flux is not used.

esheldon commented 2 years ago

How should I be rendering the models?

esheldon commented 2 years ago

Oh, I didn't realize that get_model returned a plain array. I guess you are saying I should not then call render.

And I see now why you prefer the name "convolve" to "render". Based on the name I assumed that the model had not been realized in an array before calling render.

fred3m commented 2 years ago

source.flux is already rendered. To do the reweighting it convolves the model for each source and uses the ratio of source.get_model templates for each source to redistribute the flux in the full model, storing that in source.flux. To do it all in one step you can change line model = blend.get_model(use_flux=reweight, convolve=not reweight), which will perform a convolution if not re-distributing the flux and just return the re-weighted models if you are. With this replacement you can just remove line 8 entirely.

fred3m commented 2 years ago

And I see now why you prefer the name "convolve" to "render". Based on the name I assumed that the model had not been realized in an array before calling render.

Yes, this is one of the big differences in scarlet lite, where rendering only involves convolving the model. In main scarlet, because the model can be much more complicated, involving different resolution, image orientations, and spectral components, rendering was designed to be a much more complicated feature.

fred3m commented 2 years ago

Ok, I think that I addressed everything.

herjy commented 2 years ago

Here are my comments on the multiscale_deblending notebook:

I don't find the part where the loss is computed from the wavelets that compelling. Everything this does in the end is change the hyperspace inn which the gradient lives, but still follows the same optimisation. Overcompletness might contain better solutions, but I'm mostly worried that it contains more local minima, and we see indeed a solution of the same caliber as the direct space loss, but with different blending artefacts.
On the StarletConvolution, I don't understand why you do this, but I recognise that it works really well. It seems to me that you convolve by an extra spline to smooth things out, but where does that come from and what am I missing?
In the result of the deblending, I strongly suggest you plot the direct space residuals. I did it on my notebook, I can push it if you want. I notice that the low frequency blending artefacts are indeed gone, that's epic! There are some higher frequency features left in central parts of complicated galaxies though. I think this is because monotonicity in wavelet space is not perfectly adapted as I have mentioned in other conversations. One way to improve upon this though might be by properly taking into account the wavelet norm of each ban when drawing the monotonicity mask. Indeed, for a given source, the variance in each wavelet level will depend on the psf, the variance in the image, the wavelet norm at each scale and the spectra.

fred3m commented 2 years ago

I agree that computing the loss with wavelets isn't compelling. I think I didn't totally make the purpose of this notebook clear, and I should at least explain that more. I wanted to justify the architecture of scarlet lite by demonstrating how easy it is to change parts of it, like the blend or source class, to quickly test new algorithms and ideas, which is something that takes a lot more work in standard scarlet. So I agree that (as we can see from the result) there isn't a benefit in this case when using a different loss function, but it gives users an example of what is possible with a minimal amount of coding.
As you mention, this is mainly done to keep the morphologies smooth and also to keep them compact. Notice that the loss function is calculated using the gen2 wavelets but the models are just summed together without the extra spline convolution that is used in the reconstruction (so without the full bspline convolution they are gen1 starlets). So instead of using the fast separable spline wavelet algorithm, I use the full convolution to make sure that the morphology is band limited appropriately, and the extra convolution allows the model to be matched with the gen2 wavelets.
Showing the direct space residual is a good idea and I'll include that in the notebook. I'm not totally convinced that the central residuals are due to a lack of monotonicity in the wavelets. I've thought about this and looked into it in multiple ways, and residuals at the center of objects seems to be an artifact of the deconvolution that is done to go from observed space to model space. For example, if I use Starck et al's wavelet deconvolution algorithm you see the same thing, rings that don't realy exist in the model space but can exist because of the degeneracy of the deconvolved solution, and poorly modeled centers unless you give it a lot of iterations. So the poorly modeled centers could just as easily be an artifact of an effective second deconvolution (in addition to the one with the PSF in each band) that makes it harder to model the center. You and Peter may be right about non-monotonicity in the higher wavelet scales, I'm just not convinced yet.

herjy commented 2 years ago

My bad. Indeed that's a great demonstration of this new modularity, but maybe it should be in another place.
I did not pay attention to the reconstruction, so know I understand why what you do is correct, but I'm not completely clear as to why you don't just let the reconstruction happen in full gen2. It should do the same non? Anyways, now I understand much better and also why it works. All good.
I think the attached image illustrate the issue of monotonicity in wavelets. If you look at the central region of the brightest galaxy, you see a dip all around it in the highest frequency scale and the south western spiral arm is not detected as a source, so it can't be modeled properly and creates some nasty artefacts
.

fred3m commented 2 years ago

Ok. I think I'll move the first example into the scarlet lite notebook and just remove the this notebook from the scarlet docs. @pmelchior, is there anything else that you'd like to see me do before merging?

pmelchior commented 2 years ago

Let' me go through your responses in code. I like many things in the PR, but I have two bigger conceptual problems with lite, which I like to address:

The relation between lite and main is not clear. The way it's done in this PR will merge a quite distinct branch into main, so that one could think of using lite classes in main methods (although they at least have names like Lite...), or even worse main classes in lite methods (because they don't carry lite in their names). The tutorial notebook that demonstrates the use of lite as part of the main docs makes this confusion even worse. Some detection methods and or constraints could/should be used in main, but I can't tell which ones and how exactly.
From your earlier comment:

I wanted to justify the architecture of scarlet lite by demonstrating how easy it is to change parts of it, like the blend or source class, to quickly test new algorithms and ideas, which is something that takes a lot more work in standard scarlet.

That, to me, is a very different argument for lite than the one we started from, namely getting autograd out of the code and streamline it for bulk processing. There's a lot of scope creep because of this decision. Moreover, the users who care about speed in bulk processing are not usually the ones who test new algorithms. In fact, I'd recommend that main remains more experimental (as autodiff makes it more capable in general) compared to lite.

fred3m commented 2 years ago

Just to address your comments from Friday:

There were very few changes to scarlet main in this PR. The lite notebook does use some methods (namely detection methods) that we don't use anywhere else, but those are from a completely separate effort than the lite effort, just improvements needed for other things that I'm working on. Including them in the lite notebook allows us to make sure that they don't bitrot.
That was always part of my desire to move to lite. In addition to the runtime and memory issues with autograd, the code in the main branch uses a very different architecture that has made testing and developing new features cumbersome. The main problem for me is that the class hierarchy is basically all designed for initialization, since once the models are initialized nearly all of the source classes perform exactly the same. So while stripping out autograd I kept in mind all of the things that I have on my future plate to test: different optimization algorithms, wavelet updates, hierarchical cataloging, etc. Because of the way that we have organized the code, each one of those would require a completely new "branch" of scarlet, with custom classes from the top to bottom in order to take advantage of our current initialization.

Ideally I would like to see the architecture of scarlet match the lite branch, but I don't have the time to put in the effort required to do that. But the architecture needed to change anyway, since a lot of the design that went into scarlet main was to handle things like rendering models and matching observations in both spatial and spectral dimensions, which are both trivial in the lite module. I think that it's clear now that the main branch doesn't need anything from the lite branch, however lite does make use of classes and functions from scarlet main. I understand your argument about liking autograd to calculate analytic gradients but TBH I'm not sure that it's necessary. We don't use any methods that we couldn't write gradients for ourselves, but if we really wanted to be lazy we could implement a grad method for a LiteComponent that imports autograd and automatically calculates the gradients for a class (I have a notebook somewhere that does that and it's fairly trivial to implement, and also allows users to use pytorch or jax).

To me the difference between lite and main is that main does not make the assumption that the input observations have been spectrally and spatially aligned prior to running scarlet while lite does. In that sense we could rename scarlet.lite branch to scarlet.aligned or scarlet.reprojected (to use the nomenclature common in astronomy) to make it clear where one branch should be used vs the other.

pmelchior commented 2 years ago

I'm OK with the changes and this PR overall. But the style of lite, and its restriction to the aligned case (that surveys generally have at least for coadds) suggests to me that lite becomes its own package (scarlet-lite or so). Thinking this through, there are two options:

scarlet-lite imports, and thus depends on, scarlet for e.g. its initialization and constraint implementation
scarlet-lite forks from the merged code after this PR and strips out everything it doesn't need to form a standalone implementation of the factorized model for aligned data.

From a maintenance perspective I'd prefer option 2, even if there's some code duplication. It also has the advantage that scarlet can carry out the planned transition to pytorch to streamline and speed up the multi-resolution operations. At this point, we'd have to maintain CPU and GPU code in one repo (e.g. for FFTs, padding, monotonicity, etc), which doesn't sound like fun to me. A CPU-only scarlet-lite would be cleaner. And lite is likely to have a more stable API, which makes it easier for Rubin to work with it.

pmelchior / scarlet

Lite #257