pmelchior / scarlet

hyperspectral galaxy modeling and deblending
MIT License
49 stars 22 forks source link

Deblend failure when weights=0 for one or more images? #245

Closed entaylor closed 3 years ago

entaylor commented 3 years ago

I'm running into an issue when trying to deblend a set of images, where the data are incomplete in one or more images.

What i have done is to set the values of both images and weights to be zero where no good data exist.
missing_data_example

I am uploading an image to illustrate a problem case. From left to right, the columns show: the imaging data; the weights associated with the imaging; the rendered models after deblending, obtained as observation.render(blend.get_model()).

Two things happen. First, the fluxes associated with those images where data are missing blow up to very large numbers. This is sort of understandable, in that the fluxes are unconstrained on the interval of positive numbers. The second is the bigger problem: the fluxes associated with images that do have data are driven to zero: ~1e-20. This shouldn't happen, because the fluxes here can be well constrained.

It seems superficially that the masking of bad/missing data via setting weights to zero isn't fully respected in the case where some large fraction of one or more images is bad/missing, with the bad behaviour somehow tied to an overall normalisation of the spectrum.

I've also attached the same case, but excluding the first two images (where data are missing), to show the 'good' solution. I could include in my pipeline a check for missing data (though i'm not sure what the right condition might be precisely: the centre pixel? Some fraction of the bounding box?), but i feel like the code should be able to cope with this case? missing_data_example_noproblem

fred3m commented 3 years ago

Thanks for passing this along, this isn't something that we've come across before, and we do run this on images with zero weighted regions. But I did a test run just now using three different configurations: the first configuration was the full image with the default weights, the second configuration had zero weights and zero image values for the bluest band from [:60, 40:] (the lower right corner), and the last configuration used zero weights and image values in the reddest band from [:60, 40:].

Original

Screen Shot 2021-06-23 at 12 13 11 PM

Blue zeros

Screen Shot 2021-06-23 at 12 16 45 PM

Red zeros

Screen Shot 2021-06-23 at 12 16 07 PM

scarlet seems to be able to deblend bright objects with significant flux in all bands even with missing data (see source 0), and even faint sources if the majority of their flux is in one of the bands with data at the location of the peak. The struggle seams to occur when a faint sources has most of its flux in a band that is missing data, and it looks like this has to do with initialization.

To test this idea I initialized all of my models using the image and weights with no data in the reddest band. Then I set the SED for all of the faint sources in that band to zero:

for source in sources:
        if isinstance(source, scarlet.MultiExtendedSource):
            pass
        else:
            sed = source.parameters[0]
            sed[bad_band] = 0

Then I ran blend.fit as normal and obtained the following, which looks roughly correct:

Screen Shot 2021-06-23 at 12 29 31 PM

So it looks like the issue is that when a faint source is initialized with the majority of its flux in a band with missing data, there can be non-zero initial flux in the missing band that can never be removed. So the optimizer makes a futile attempt to suppress the flux by using an ever increasing negative gradient update, which suppresses flux in all bands to the minimum allowable value 1e-20 (used to prevent divergence in the gradient update).

This is something that our initial models should correct for, and we should definitely schedule an update to do so. In the meantime I would recommend implementing the above, which seems stable (but I would add a flag to note that there is missing data in a band or bands).

entaylor commented 3 years ago

hashtag-dirtyspacenews! : )

Thanks for this. I'll add the workaround/check for now, and stay tuned for updates.

pmelchior commented 3 years ago

This appears to be an issue with the set_spectrum method that solves for the best spectrum amplitude given the initial morphology: https://github.com/pmelchior/scarlet/blob/92b247d415d2e958b7683ff61b5b2f7328b8e389/scarlet/initialization.py#L569-L574

This solver will go haywire if the weights are zero everywhere the morphology is non-zero. We need to catch that.