Open fred3m opened 5 years ago
Code in branch pixelcnn
seems to be working at least for single band observations and without specifying PSFs:
Nice!
@EiffL I see that you branched off of adam
, which is the right place. I guess we need to talk about centering of galaxies, though, because your pixelcnn assumes that galaxies are in the middle of the image, right?
So what we did with Fred is to extract a centered postage stamp from the frame, and that's what's fed to the pixelcnn. The position of the postage stamp is updated at each iteration based on the current center estimate.
On Fri, Jul 19, 2019, 5:16 PM Peter Melchior notifications@github.com wrote:
Nice! @EiffL https://github.com/EiffL I see that you branched off of adam, which is the right place. I guess we need to talk about centering of galaxies, though, because your pixelcnn assumes that galaxies are in the middle of the image, right?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fred3m/scarlet/issues/128?email_source=notifications&email_token=AAGSLF634QIDI4YA6X6T3KDQAHLEHA5CNFSM4IFD2TDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2L5VRI#issuecomment-513268421, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGSLF5RJ3GYL5IFBUKFQ7DQAHLEHANCNFSM4IFD2TDA .
This being said, we thought of a few other problems.
Because the training set postage stamps are of even size, there might be a one pixel offset between the prior and the extended source postage stamp... Might induce a small systematic bias if not accounted for.
Because of the way the prior is trained, i.e. on standardized 1s exposure HST images, I think the value of the SED in the i-band should be set to 1 (adjusted for different instruments and exposure time), that's the only form of normalization that should be required.
On Fri, Jul 19, 2019, 5:18 PM Eiffel fr.eiffel@gmail.com wrote:
So what we did with Fred is to extract a centered postage stamp from the frame, and that's what's fed to the pixelcnn. The position of the postage stamp is updated at each iteration based on the current center estimate.
On Fri, Jul 19, 2019, 5:16 PM Peter Melchior notifications@github.com wrote:
Nice! @EiffL https://github.com/EiffL I see that you branched off of adam, which is the right place. I guess we need to talk about centering of galaxies, though, because your pixelcnn assumes that galaxies are in the middle of the image, right?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fred3m/scarlet/issues/128?email_source=notifications&email_token=AAGSLF634QIDI4YA6X6T3KDQAHLEHA5CNFSM4IFD2TDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2L5VRI#issuecomment-513268421, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGSLF5RJ3GYL5IFBUKFQ7DQAHLEHANCNFSM4IFD2TDA .
Good that Fred and you caught up. Shifting is solved then.
I think we can make sure that the sizes are always 64x64.
As for the normalization, we could do that too as part of the preparation of the image for the prior, but it's not that simple to automate. Our SED are total fluxes in observed units, so the conversion to 1s of HST i is not trivial in general. I know that you want to have the relative flux preserved because morphology can change with flux, but have you tested that assumption? It would be much more straightforward to normalize peak intensity to one.
Hum... I don't think normalizing peak intensity is practical because it would change the level of the background noise in the images. But, if we don't particularly constrain the normalization of neither SED nor morphology, it should in principle sill work: the prior will be breaking the degeneracy. Now, the only problem is to ensure that the first guess solution for the morphology is in the right ball park in terms of fluxes to be compatible with the prior.
I'll make a few tests to see how all this behaves in practice.
We also need to do a more thorough investigation of how to calculate our gradient step sizes. While ADAM isn't too sensitive to the choice of step_sizes, in the image shown above we were getting nonsense results initially because there was a difference in 4 orders of magnitude between the pixel CNN gradients and the likelihood gradients. So we just scaled the pixel CNN gradients down to make it work but there is no guarantee that the step size ratio will be the same for all images, in fact it will almost certainly be different depending the spectral norm of A and S.
Adam expects the step sizes in data units, i.e. for the morphology something like 0.01 should be fine, and for the SED something of order 1% of the expected amplitude. Less to be on the safer side, more to be faster.
But Fred's point is that the Lipschitz constant of the prior term is 10^4 larger (or smaller, I don't remember which way it goes) thank that of the likelihood term. Hence, the effective step size of an Adam update is 10^4 smaller than without the prior, and the effect of the likelihood term is very small. Without scaling the gradient of the prior, we would probably have had to run for 10^6 iterations to converge to the same solution we got in 100 iterations.
There might be a proper way of properly scaling the two gradient terms, otherwise this heuristic of scaling the prior to have gradients of similar amplitude might work well in practice...
On Fri, Jul 19, 2019, 9:33 PM Peter Melchior notifications@github.com wrote:
Adam expects the step sizes in data units, i.e. for the morphology something like 0.01 should be fine, and for the SED something of order 1% of the expected amplitude. Less to be on the safer side, more to be faster.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/fred3m/scarlet/issues/128?email_source=notifications&email_token=AAGSLFZWAZUVOBP3FF5HARLQAIJHPA5CNFSM4IFD2TDKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD2MRSPI#issuecomment-513349949, or mute the thread https://github.com/notifications/unsubscribe-auth/AAGSLF336PG6LPJ2W6CEW6TQAIJHPANCNFSM4IFD2TDA .
This would point to an improper definition of the likelihood or the priors. When inverse variance weights are set, the likelihoods should be proper, so you need to check the prior normalization.
None of that is related to the step size in Adam, which is externally set and independent of the amplitude of the gradients. That is the clever thing about it.
In other words, you don't need Lipschitz constants for the prior, simply computing the grad is sufficient
I have relocated development to a new branch u/EiffL/pixelcnn
where I'm changing a little bit the prototype we had built last week with Fred. And it's now based on the latest version of adam
Just ran a test, seems to work nicely :-)
Thanks, this looks pretty good. Do you have the trained pixelCNN for us to play with?
Am not (yet?) a fan of putting tf code into scarlet's main classes. I'd rather see it contained in its respective Prior
class implementation. This way we don't force users to have it. I'm also not sure of the get_centered_ROI
method should be in MorphParameter
instead of the MorphPrior
where it's functionally needed.
In terms of creating a batch: could the get_centered_ROI
method be written in tf and be part of the mini-batch processing?
I have packaged one of the trained pixelcnns I've been using and hosted it on the scarlter-pixelcnn repo. Here is how one would use this stuff:
import scarlet
from scarlet.prior import MorphologyPrior
module_path='https://github.com/pmelchior/scarlet-pixelcnn/blob/master/modules/pixelcnn_out.tar.gz?raw=true'
im = fits.getdata('blend.fits')
im1 = fits.getdata('gal1.fits')
im2 = fits.getdata('gal2.fits')
prior = MorphologyPrior(module_path)
frame = scarlet.Frame((1,65,65))
weights = im.reshape((1,65,65))*0+1./0.01147
observation = scarlet.Observation(im.reshape((1,65,65)), weights=weights).match(frame)
sources = [scarlet.ExtendedSource(frame, (32, 32), observation, bg_rms=array([0.03]), prior=prior, symmetric=False, monotonic=False),
scarlet.ExtendedSource(frame, (32, 42), observation, bg_rms=array([0.03]), prior=prior, symmetric=False, monotonic=False)]
blend = scarlet.Blend(sources, observation)
blend.fit(500, e_rel=1e-6, step_size=1e-3)
where gal1.fits
, gal2.fits
, and blend.fits
are hosted there: https://github.com/pmelchior/scarlet-pixelcnn
One thing to note is that if I don't downscale the strength of the prior, it now works (instead of giving garbage as what we were seeing last week) but the images are over-regularized, it actually decreases the fit to the data. Still need to make sure we are properly scaling all the terms.
Concerning the Tensorflow stuff, the problem is that it makes more sense to have the blend object take care of Tensorflow sessions, just like it's already blend.fit() which takes care of computing the gradients with autograd. In principle you could have several priors, they each define a different tensorflow function, but the execution environment is handled by a different class.... Note that in TF2, things may be different, since there are no sessions to keep track of...; But anyway, the way I've written it, the tensorflow module only gets imported if needed. Under "normal operations" the import is hidden.
As for get_centered_ROI
, yes I think it's fine to move it elsewhere if you want, maybe more as a function than a class member if it's not associated with the MorphParameter
class. I don't have strong feelings about it :-)
Small addition to the previous post, I have added inverse variance weights in the Observation and removed in the code the arbitrary scaling factor we had on the prior term.
I orginally was confused because I thought the bg_rms
term in ExtendedSource would end up defining the variance in the l2 loss function, turns out it's the weight
argument in observation that plays this role.
Why not defining bg_rms in Observation, and then each source can use that to initialize itself? Is there a use case where different sources might have different bg_rms, that would also be different from the likelihood variance?
In any case, here are the results with this setting: and each component compared to the true input galaxy:
You're doing it correctly now. ExtendedSource has an implementation that predates Observation, so we haven't closed the loop. One concern is that weight is an optional argument, but bg_rms is required for the source initialization, but we can raise an exception.
Would you open an issue in scarlet to make that adjustment. Simply link to this thread.
Beside the weight vs bg_rms discussion we'll address in #129, and improve the packaging and dependencies later on, there are important changes we need.
The noisy models are gone but I'm still seeing artifacts in the models
If I set e_rel=1e-7
the artifacts go away
Residuals look good
I just realized however that the code ran for the max of 500 iterations, and took 2 minutes
@esheldon Great that you got it to work. Another important factor here is the step size used with the optimizer. At 1e-3 optim is unstable, at 1e-4 it works much better. On my laptop, I get convergence (1e-7) at step_size=1e-4 in 20s and ~300 iterations. Still not exactly "fast", but there are bunch of amortized inference techniques we could be using to speed this up...
@pmelchior Is there a maximum number of iterations that would be a deal breaker? i.e. can we decouple the optimization speed from the quality of the inverse problem? I would say we can, but if you disagree, we can look to the unrolled optimization procedures that people use in medical imaging nowadays for fast inverse problem solving.
@pmelchior Concerning handling different sizes of observations, it's already in place. The code extracts postage stamps around the sources, with sizes that match the prior. As for flux normalization, can we adopt some LSST scale? idk what that may be. The "good thing" about these standardized HST fluxes is that the pixel values are already <1 hence quite amenable to neural network optim.
@EiffL I think that the maximum number of iterations and maximum run time depends a lot on how this scales with the number of pixels and the number of objects in the blend.
Do you know how this scales with pixel resolution (pix/arcsec)? If we create these templates at LSST resolution (~.2 arcsec/pixel) but convolve them with a narrow PSF as compared to LSST seeing (sigma~.8-.9) that should make a huge difference, right? Naively I would think that the network speeds things up linearly with the number of pixels in a template/postage stamp but if you think that the relationship is more complicated I wouldn't be surprised.
To have an idea about the target run time, the requested goal by LSST was initially 10ms/band/source in the blend (so ~100ms for a two object blend in 5 bands), although I think that we've successfully been able to convince them to allow more time for deblending. I imagine that 10s per object will be way too high but if it runs at least 20 times faster when trained on LSST/HSC resolution templates then we should be ok. However, if this works as well on blends with a large number of sources across multiple bands as it looks like it might, given the difficulty of deblending in LSST we might be able to convince the powers that be that the extra time is worth it.
@pmelchior will have to speak for joint processing and what their expectations of run time are but he is on vacation for the next two weeks.
Thanks for this @fred3m , So I think the pixel resolution is not going to make a lot of difference, there is a big overhead in running the tensorflow session, which is probably taking more time than computing the gradients itself.
This being said, at lower res, with fewer details the optimization problem will be much faster.
Ok, I'll keep in mind the 100ms target :-) It's probably achievable, but maybe not rightaway...
Almost done addressing the issue linked above. Demo notebook of training a pixel cnn from cosmos images at HSC resolution using distributed cloud TPU training :-) https://colab.research.google.com/drive/1p-DOHO6aU1ToPG4lca_pjHyHMt7sxs5l
So, after a weekend of experimentation on multiband images here is a list of things that are not yet perfect:
Relative normalization of SED and morphology: the default morph_max
strategy ended up being the more robust. But it artificially fixes the flux of the morphology component to 1, and ends us biasing the morphology. I tried removing the normalization altogether, or fixing the SED in one band, didn't give good results
Scaling of prior to likelihood: In principle, when using the weights, the normalization should be proper. But turns out, without tuning, the likelihood wins over the prior, leading to ugly results, artificially lowering weights gives increasingly better results. This is also tied to the previous point.
Initialization of sources when doing deconvolution: I have trained a prior with an HST-like effective PSF, which means that when applied on HSC images, the recovered model is effectively deconvolved. The standard initialization strategy in this case leads to grossly overestimated source footprint, and this makes the deblending with pixelcnn prior more difficult. The best strategy I have found is to first run standard scarlet, to get a deconvolved first guess, and then use the scarlet solution as a starting point for pixelcnn deblending.
Here is an example of first guess using scarlet:
And updated after minimization with pixelcnn prior:
In this example, I used morph_max
SED normalization, and I downweight the likelihood by an arbitrary 0.001
factor.
This looks really nice! When you say you trained on HST so applying to HSC effectively deconvolves, you still explicitly account for the convolution in the inverse problem? It looks like there is some ringing in model 1. Any idea where this comes from? Also, I wonder if, for initialisation scarlet could be ran without symmetry. It would be faster and would probably still get us where we need to be.
Yes yes yes, and I need you help @herjy :-) I tried to build the training set at the HSC pixel resolution, with a standardized effective, PSF, so that all galaxies in my training set have the same PSF. But it's not working well, I have aliasing.
Here is the notebook I used to build an effective PSF for the training set: https://github.com/ml4astro/galaxy2galaxy/blob/u/EiffL/pixelcnn/notebooks/BuildingCOSMOS_PSF.ipynb
Then the images for the training set are generated using GalSim here: https://github.com/ml4astro/galaxy2galaxy/blob/01cffaa59da229e93f9e839a54cf18443f8b914c/galaxy2galaxy/data_generators/cosmos.py#L101
Using @EiffL 's code we will create a source that uses a PixelCNN to constrain morphologies.