keflavich / brick-jwst-2221

Data reduction & analysis for the Brick JWST 2221 data
4 stars 3 forks source link

Removing Stars with cloudcoverr #9

Open keflavich opened 1 year ago

keflavich commented 1 year ago

@andrew-saydjari can you help us get cloudcoverr going?

These are the code snippets for PSF creation:

Wrapper of crowdsource PSF: https://github.com/keflavich/brick-jwst-2221/blob/main/analysis/crowdsource_catalogs_long.py#L61-L105

Download the relevant PSF from MAST (with tons of infrastructure crap to deal with failed downloads) https://github.com/keflavich/brick-jwst-2221/blob/main/analysis/crowdsource_catalogs_long.py#L276-L303

then loading that PSF into the wrapper: https://github.com/keflavich/brick-jwst-2221/blob/main/analysis/crowdsource_catalogs_long.py#L319-L325

the actual crowdsource call: https://github.com/keflavich/brick-jwst-2221/blob/main/analysis/crowdsource_catalogs_long.py#L433-L435

that whole script could be super compressed; it's giant because of all the diagnostic image making & parameter comparison work.

andrew-saydjari commented 11 months ago

Hi @keflavich, as you know from https://github.com/schlafly/crowdsource/issues/17, the main issue here on my side was just fixing the PSF wrapper model you were using. After doing that, I have CloudClean (the ISM focused version of CloudCovErr) running on a subset (1000 x 1000) of a single filter (F466N) of your image coadd (i.e. a cutout of jw02221-o001_t001_nircam_clear-f466n-merged-reproject_i2d.fits). I have some suggestions for overcoming limitations/next steps at the end.

First, I ran crowdsource. Limitation 1 is that the WebbPSF is not great and so we over-deblend sources, adding spurious sources in the PSF wings. Not great. Further, if we had a better PSF, we could choose better crowdsource settings to more carefully preserve the ISM. I ended up chosing to just lean into the over-deblending because it is going to be a mask for this purpose, and ran fit_im as below.

resout = py"cs.fit_im"(im_clean,psfmodel_py,weight=wt,dq=d_im,
    ntilex=1, ntiley=1, refit_psf=false,psfderiv=true,fewstars=100,maxstars=320000,verbose=true,threshold=5,
    blendthreshu=0.2,psfvalsharpcutfac=0.7,psfsharpsat=0.7,miniter=2, maxiter=10);

image

Then I made a mask out of this. Ideally, your photometry is good enough that you can just threshold the model image. Here, I had to grow that mask by 1 pixel in every direction. I also applied a threshold on the residuals themselves, because there look to be a few sources/artifacts that are not being properly masked by the upstream pipelines/how I am using the data products. This is dangerous if you have large amplitude ISM residuals (sharp filaments) and should be avoided by better masking those other artifacts.

msk_bad = (mod_im.-sky_im) .> 4;
println(count(msk_bad)/length(msk_bad))
msk_new = grow_msk2d(msk_bad,rad=1)
msk_new .|= abs.(clean_im.-mod_im).>10
println(count(msk_new)/length(msk_new))

With that mask in hand, I just ran CloudClean.jl proc_continuous with some reasonable initial settings (I have not played with them at all). FYI, you need to grab the GitHub version, the public Julia package needs an update pushed to it and I haven't gotten the time to do that yet. As a note, proc_continous is a bit simpler algorithmically than proc_discrete, which is the closer CloudCovErr.jl analog. In the discrete version, you visit every star, and conditionally infill the missing pixels masked due to the stars bright residuals, centered on the star location. In proc_continuous you just have an image and a mask, and you scan over the image filling in missing pixels as you go. It is really simple, but works well and removes a lot of the complexity in CloudCovErr.jl coding-wise, for almost no loss (especially when you care about the ISM and not the photometry). If you actually care about the photometry, CloudCovErr should be fairly straightforward (except for dependency installs) for you to use now, and it will provide those corrections.

Np = 31
widx = 151
slcs = (400:500, 400:500)
out = proc_continuous(res_im, msk_new; Np=Np, widx=widx, tilex=1, seed=2021, ftype=64, ndraw=1)

What I am showing in the plot below is the original residuals, then the residuals masked with the mask defined above, then the mean infill (this will look smoothed in the infill regions), then one infill draw (which has statistical noise that should make infilled pixels indistinguishable from real pixels). I think it looks great except for the saturated star in the top right, around which there are some unusual negative values that I suspect are upstream artifacts. What you should lock in on is the beautiful preservation of the filaments running diagonal in the image.

image

And of course, we actually want this in image space, so we can add back the sky model to get our clean picture of the ISM.

image

My thoughts on things that could be useful/next steps:

  1. Make sure the flags from the JWST pipeline are being used optimally to mask pixels.
  2. Some improvements to the PSF model (gaussian blurs to simulate coadd, then rerun crowdsource)
  3. Performance of the JWST PSF model wrapper for crowdsource.
  4. Run on the full single band image.
  5. Run on each of the relevant bands and combine.

The last point is maybe the most interesting. Right now, CloudClean is single band. But, I have thought a lot about, and think I know how to make it multiband. One can imagine that the draws for the infills being independent per band might mean that the overall recovered image does not have the right color fluctuations in those areas. If that ends up being a problem (or you want me to scout out if that will be a problem on this test patch), let me know. I would love to see an example of it being a problem and would probably just implement multiband CloudClean.jl to solve the issue.

keflavich commented 11 months ago

For the PSF: maybe it is best to try to derive the PSF from the image. But you have seen the data now, and the field is incredibly crowded. Is an empirical PSF likely to be better or worse than a blurred theory PSF?

4 & 5 are most interesting to me right now. We have 10 bands covering most of the field, and a key science case is ratio'ing these to measure extinction. I'm happy to send on more data to enable this.

keflavich commented 11 months ago

@andrew-saydjari pointed out that crowdsource was only running 1 iteration on my production runs previously. This was because the iterations stop if the maximum number of sources - 40,000 by default - was reached. The solution is to increase maxstars, as he showed in https://github.com/keflavich/brick-jwst-2221/issues/9#issuecomment-1804973977