pmelchior / scarlet2

Scarlet, all new and shiny
MIT License
15 stars 4 forks source link

In built initializations #33

Closed SampsonML closed 7 months ago

SampsonML commented 9 months ago

Functions for Scarlet2 default initialisations for the morphology and spectrum are included here: https://github.com/pmelchior/scarlet2/blob/initializations/scarlet2/initialization.py

To run may simply modify your fit routine as so adding two lines

init_spec = scarlet2.initialization.init_spectrum(obs, center)
init_morph = scarlet2.initialization.init_morphology(obs, center)

So he new routine looks like,

with Scene(model_frame) as scene:
    for center in centers:
        # initialize the prior
        prior = nn.ScorePrior(
            model=prior_model, transform=transform, model_size=model_size
        )
        # initialize the spctrum and morphology with Scarlet2
        init_spec = scarlet2.initialization.init_spectrum(obs, center)
        init_morph = scarlet2.initialization.init_morphology(obs, center)

        # construct the source
        Source(
            center,
            ArraySpectrum(
                Parameter(init_spec, constraint=constraints.positive, stepsize=5e-2)
            ),
            ArrayMorphology(Parameter(init_morph, prior=prior, stepsize=5e-3)),
        )

# now fit the model
scene_fit = scene.fit(obs, max_iter=220, e_rel=1e-4)  
renders = obs.render(scene_fit())

Currently, fitting for the boxsize is not done very nicely, so brainstorming and comments welcome. This is also the speed bottleneck in the morphology initialisations too.

Overall this works well producing some reasonable initialisations especially given our more robust optimisation scheme. inits renders

SampsonML commented 9 months ago

Note, this adds dependencies for dm_pix and jaxopt both pip installable.

SampsonML commented 8 months ago

@pmelchior I will change the box size routine based on the scarlet1 methods, can you link where this happens in the scarlet1 repo

pmelchior commented 8 months ago

Sure. Here's the relevant bit from the initialization of SingleExtendedSource: https://github.com/pmelchior/scarlet/blob/45187fdccbe8b8c8d7551a84572d7243b24bc8cb/scarlet/source.py#L425-L439

The main steps are:

  1. Create a detection image for every observation. It's the inverse-variance weighted sum of all the bands for this observation. This image and its combined variance image was stored inside of Observation, so that one doesn't have to repeat this for every source in the same scene.
  2. Run the monotonicity operator on the detection image, which prevents multimodal morphologies from neighbors. It also stops the model at 0. We don't have this operator, so we need to do something else instead. More below.
  3. Define a bounding box for which all pixels are above threshold = factor * std_detect. This step considers the noise level explicitly. Factors above 1 were OK because we could resize the boxes in scarlet, should the model ever grow to hit the edge of the box. We don't have that resizing ability either (not yet, at least) because that would require re-jitting make_step in the fitting code.

So, what can be done in scarlet2? With a prior, the morphology initialization matters less than it used to with proximal optimization. Your tests have shown that a circular Gaussian with the size of the PSF would be fine in most cases, just not for the very largest galaxies. This new Gaussian fitting method goes beyond that considerably, but we may not need that. If we could avoid a fitter at this step, it would speed things up and remove a procedure that might fail, esp in crowded scenes. Plus, you still need a robust solution for the boxsizes.

Let's simplify this: The two inits that are needed are spectrum and morphology. If we assume everything is a point source, then we have the correct spectrum initialization here, but it needs to be done with correct_psf=True, so I suggest directly porting this method.

For the morphology, let's assume the shape of the model_frame PSF. This is a narrow GaussianMorphology (see e.g. the implementation of GaussianPSF), and is already being used for the computation of the spectrum. So, this would make the two initialization consistent and correct if the source is either really a point source or only marginally resolved. Robert would call them the "faint fuzzies". That's the bread-and-butter case, so being robust for them is important. Plus, GaussianMorphology has its own box size computation (+-10 sigma). This initialization is what we called CompactSource in scarlet1.

But what about the larger/brighter sources? I hope that we'll get substantially better initializations by another route, namely joint survey detections, but until then I am thinking of a weird heuristic.

  1. Start with the compact initialization as above.
  2. Cut the box at the specified boxsize of GaussianMorphology.
  3. Compute the average spectrum over all pixels in the observation data along the inner edge of that box.
  4. If that average spectrum is below noise_thresh * noise_rms (more that below) in all channels, stop: box size is fine.
  5. Compute the correlation coefficient of the average spectrum with the peak spectrum (from the spectrum initialization).
  6. If that correlation is less than corr_thresh, stop: it either gets pretty noisy or there's another object along the edge that is changing the colors.
  7. If you're still here, create new GaussianMorphology with sigma increased by some amount, maybe a factor 1.5 or 2. Go back to step 2.

This method keeps the Gaussian circular and allows it to grow. It does not require the detection image, but it does need to local noise level of the observation. Here's the methods from scarlet.Observation, which we should port:

    @property
    def noise_rms(self):
        if not hasattr(self, "_noise_rms"):
            import numpy.ma as ma

            self._noise_rms = 1 / np.sqrt(ma.masked_equal(self.weights, 0))
            ma.set_fill_value(self._noise_rms, np.inf)

        return self._noise_rms

If we want to do the same, we need to use object.__setattr__ because Observation is now a dataclass instance, so one cannot do a belated assignment of a variable like it's done above.

SampsonML commented 8 months ago

This is now updated, and as far as I can tell I have implemented the above, however do check to confirm the logic is correct.

It works ok...but perhaps my threshold values are way off for noise or correlation. Will look into more soon.

SampsonML commented 8 months ago

Added a pseudo monotonicity check the codes working much nicer now here : https://github.com/pmelchior/scarlet2/blob/2a34a456753bb2af39972079a4a7ec058772f3b5/scarlet2/initialization.py#L118

However one minor issue. The score function required the morphology array to be roughly [0,1]. Using the scarlet1 and GaussianMorphology methods the morphology array was definately not normalised, this is fine I did the normalisation myself when returning the array. However, I realise my port from scarlet1 for init_spectrum must be working on taking spectrum values with this non-normalised morphology array somewhere?

Having the output morph array must be [0,1] for score func, so asking best ideas to have the init_spectrum give correct spectrum estimates for this normalised morphology array

SampsonML commented 7 months ago

Discussion moved to pull request #47