spacetelescope / jwst

Python library for science observations from the James Webb Space Telescope
https://jwst-pipeline.readthedocs.io/en/latest/
Other
564 stars 167 forks source link

Should resample automatically subtract background? #5923

Closed stscijgbot-jp closed 2 years ago

stscijgbot-jp commented 3 years ago

Issue JP-2024 was created on JIRA by James Davies [X]:

I believe the answer should be "no". But currently resample does subtract off the sky background in calwebb_image3 when it has background measurements available. A scalar sky value gets computed earlier on in the stage3 pipeline via skymatch.

The single-resampled images produced at the end of the stage2 pipeline do not have the background subtracted, as they don't run through skymatch.

The current implementation checks if skymatch has computed a background level for each exposure. If it has, it subtracts it before resampling. So the output resampled images currently have a mean sky background of zero.

I suspect what it should do is subtract off those computed sky backgrounds for each exposure, drizzle into the common output image, and the add back the sky background after drizzling is completed, so that the output drizzled image has the same background as the input images.

But which sky background from which input image should be added back in? The mean of the input sky backgrounds? The minimum? Some weighted mean?

I'm primarily concerned about what should be the default for SPD processing of the associations that we will be producing. I suspect all exposures in these associations will be taken close to the same time and should thus have pretty similar zodiacal light characteristics so pretty similar backgrounds.

stscijgbot-jp commented 3 years ago

Comment by Anton Koekemoer on JIRA:

scheduled for discussion at JWST CalWG meeting 2021-04-13

stscijgbot-jp commented 2 years ago

Comment by Karl Gordon on JIRA:

Not sure this was discussed in the JCalWG context. A few comments from my point-of-view.

The background should not be subtracted by default.

While there are cases where this is fine (e.g., deep fields with lots of "sky" between fairly isolated compact sources), we will also have lots of cases where this is not fine. For example, where there is significant diffuse emission due to gas and/or dust. This will happen quite a lot at MIRI wavelengths and in narrow band NIRCam filters. The backgrounds values should be matched between images, that is all. Subtracting the background to make all the images have zero background on average will remove real and important astrophysical signal an producing images with significant, systematic negative values for a large portion of the mosaic.

stscijgbot-jp commented 2 years ago

Comment by Karl Gordon on JIRA:

To answer the question in the body of this issue. The mean of the input sky backgrounds should be added back in, if I understand the skymatch algorithm correctly.

stscijgbot-jp commented 2 years ago

Comment by Misty Cracraft on JIRA:

Also, if the skymatch step has the parameter 'subtract=False', which is the default, the assumption is that the background should not be subtracted. According to our tests, this subtraction happens even when the parameter is set to false. Is this a bug that should be filed in another ticket, or another small piece of this discussion?

stscijgbot-jp commented 2 years ago

Comment by Howard Bushouse on JIRA:

The skymatch subtract argument only affects the processing within the skymatch step itself, such that it only computes overall sky values without subtracting them. The values get stored in keywords for use later during resampling. So we would need another argument added to resample to not subtract background (if there isn't one already).

stscijgbot-jp commented 2 years ago

Comment by Misty Cracraft on JIRA:

Ok, thanks for the clarification. I think Mattia is doing some testing to see what that looks like (resample not subtracting). I think the current discussion needs to continue and the best way forward needs to be decided before we figure out if another keyword is added to resample to control subtraction at that stage.

stscijgbot-jp commented 2 years ago

Comment by Mattia Libralato on JIRA:

In "resample_many_to_many" and "resample_many_to_one" in resample.py, there are these lines:

 

blevel = img.meta.background.level
if not img.meta.background.subtracted and blevel is not None:
    data = img.data - blevel
else:
    data = img.data

I think that the issue is the "not". Indeed, I printed img.meta.background.subtracted and blevel before the "if" statement, and the former is False, while the latter is a number. So the "not" makes the first argument true, and it subtracts the bkg.

That said, there is also another issue here. If the blevel is not applied (so "else" case), the bkg is not properly registered and the resulting image (with bkg) is not OK.

 

stscijgbot-jp commented 2 years ago

Comment by Howard Bushouse on JIRA:

Yes, img.meta.background.subtracted is a record of whether background already got subtracted in the skymatch step. If not, then resample subtracts it, by default. This will need to be updated to allow the subtraction to be switched on or off.

stscijgbot-jp commented 2 years ago

Comment by Karl Gordon on JIRA:

The background matching step is only supposed to match the backgrounds, not subtract the average to get zero. This is what was requested originally. So this is not new work and is a bug. No option is actually needed. Just change the code to not subtract the average background preserving the actual measured background. Note, the background is actually a combination foreground emission (e.g., telescope, zodi) and astronomical backgrounds (e.g, diffuse emission from dust).

https://outerspace.stsci.edu/display/JWSTCC/Vanilla+Image+Background+Matching

stscijgbot-jp commented 2 years ago

Comment by Howard Bushouse on JIRA:

The background matching step is set, by default, to NOT subtract the background. It only computes and stores the information. So that part is OK, I think. It's the resample step that is then using that background info to automatically subtract background when it's creating an output image. So that's the step that needs fixing (or at least the addition of a step param to toggle the background subtraction on/off).

stscijgbot-jp commented 2 years ago

Comment by Anton Koekemoer on JIRA:

Added as watchers the various others from the Cal WG 2021-04-13 meeting who expressed interest in this (Kevin Volk  Matteo Correnti  Michael Regan   Greg Sloan  Cheryl Pavlovsky  Swara Ravindranath).

First, I agree with Karl Gordon and Howard Bushouse that the actual background matching step (SkyMatch) would be expected to just measure and record the backgrounds and their differences, and not actually subtract it.

But when we get to Resample (as per the title of this ticket "Should resample automatically subtract background?" – it does need to adjust the backgrounds in some way, because drizzling images together without removing their potentially different time-variable backgrounds can lead to photometric issues for sources. There are at least a couple of different philosophies for this:

1) subtract the full background from each exposure, producing the usual astrodrizzle-style behaviour of a drizzled product with a background level centered around 0. This is fine for datasets with mostly empty sky, but can cause oversubtraction of real astrophysical signal with large extended sources.

2) subtract only the differences, or equivalently insert a DC offset into all the exposures, corresponding to some common value – this could be the average background as mentioned, or median, or perhaps the minimum background (measured from among all the exposures), or some other estimator. The average or median background may still be subject to oversubtraction; the minimum background could be more conservative in terms of avoiding oversubtraction of real astrophysical signal though it can be biased, and another estimator may be better.

But the key point is that resample should at least subtract something, to  ensure that time-variable backgrounds are removed from the exposures before drizzling to combine them.

 

stscijgbot-jp commented 2 years ago

Comment by Mattia Libralato on JIRA:

I looked into skymatch and it seems that if the step is run with "skymethod=MATCH", the code does what we want, i.e., it adjusts the background to that of one image (the image with the lowest background if "match_down=TRUE", otherwise it uses the image with the highest background). The final image obtained by running the code with "skymethod=MATCH" and "subtract=TRUE" is identical to that obtained by using "subtract=FALSE". This happens because:

1) with "skymethod=MATCH" and "subtract=TRUE", skymatch calls "apply_sky", which not only stores the value, but also subtracts it:

 

 old_sky = img.sky
 if do_skysub:
     img.image -= sky
 img.sky += sky
 new_sky = img.sky

Then, in resample, on these lines:

blevel = img.meta.background.level
if not img.meta.background.subtracted and blevel is not None:
    data = img.data - blevel
else:
    data = img.data

The code follows the "else" path. So, the drizzle code works with "img.data", which is already adjusted for bkg differences.

2) with "skymethod=MATCH" and "subtract=FALSE", skymatch calls "apply_sky" and only stores the value. Then, in resample the code follows the first path in the "if" above and subtracts "blevel", which is NOT the total background, but just the delta_bkg.

 

If you use "skymethod=GLOBAL+MATCH" (the default method), things get more complicated. First, skymatch works as "skymethod=MATCH" and "subtract=FALSE" (so delta_bkg are stored but not subtracted to the images). Then, it computes the minimum background among all images, and adds it to the delta_bkg measured in the "skymethod=MATCH" part. Then, in the resample step, the code reads the blevel (delta_bkg+min_bkg) and it subtract it from the images. For this reason, even if "subtract=False" in skymatch, the background is subtracted in the resample step. As in the previous example, it doesn't matter if you select "subtract=TRUE" or "subtract=FALSE". 

 

Bottom line: I believe that we just need to use the correct method. If we use "skymethod=MATCH", we register the background and obtain a final drizzled image with a non-zero background. If we use  "skymethod=GLOBAL+MATCH", then everything is subtracted because of how the skymatch routine is designed, i.e., it sums the delta_bkgs and the min_bkg.

 

stscijgbot-jp commented 2 years ago

Comment by Howard Bushouse on JIRA:

So if we simply change the default value of the "skymethod" param to "MATCH" in the skymatch step, we should get the desired result (the delta bkgs are removed, but the global background remains)? That's easy enough.

But it also seems like there should be a user-settable switch in the resample step to turn off bkg subtraction, when desired, even if we have bkg subtraction turned on by default.

stscijgbot-jp commented 2 years ago

Comment by Michael Regan on JIRA:

Can't this just be done by changing the parameter reference file? Is that an easier change?

stscijgbot-jp commented 2 years ago

Comment by Mattia Libralato on JIRA:

A simple solution would be: if "skymethod=MATCH" is selected, we could automatically switch (i.e., override the input values) to "skymethod=GLOBAL+MATCH" when "subtract=TRUE". This would subtract the background with the current implementation.

For "skymethod=GLOBAL+MATCH" and "subtract=FALSE", I think the easiest fix would be to just switch to "skymethod=MATCH" and "match_down=TRUE". This way (I think) the delta_bkg would be computed wrt the lowest background, and the output image should have this background.

I will test these changes and see if they work.

When there is no overlap between images, should we allow options that rescale the background to the lowest/highest one and subtract it? The net result is, e.g., one are has background~0 and one has background >0 or <0 depending on the background difference. If there is no overlap, would it be better to leave the background as it is if "subtract=FALSE", or remove it using "skymethod=LOCAL" if "subtract=TRUE"?

Note that all these suggestions would work with "GLOBAL+MATCH" or "MATCH". I haven't tested (1) the "LOCAL" option; and (2) the case with many sets of images that overlap in groups (like a custom mosaic with a specific dither pattern that makes images to overlap, but each mosaic position is far enough from the other that the various groups do not overlap). Also, I see that there are different paths when a skygroup (I still have to understand what it is) is provided, which should be tested as well.

stscijgbot-jp commented 2 years ago

Comment by Karl Gordon on JIRA:

In the case of no overlap between images, I would advocate for doing nothing. We are in a pipeline environment and so we shouldn't do something that will work ok for some cases (isolated sources with mostly background), but not others (images with lots of extended emission).

stscijgbot-jp commented 2 years ago

Comment by Howard Bushouse on JIRA:

Michael Regan Yes, this could be accomplished via either pars-image3pipeline or pars-skymatchstep ref file additions to CRDS, which would set "skymatch.skymethod='MATCH'", but changing the default in the skymatch step code is also a trivial operation. In fact, if folks can come to agreement to make that change to the default "skymethod" value in the next ~24 hours, we can include it yet in B7.9.

stscijgbot-jp commented 2 years ago

Comment by Howard Bushouse on JIRA:

Not sure how much testing Mattia Libralato has done, but it sounds to me like if we simply change the default value of skymatch.skymethod from "GLOBAL+MATCH" to just "MATCH", it would compute and store values that represent just deltas between each (overlapping) image, at which point we may not need any changes at all to the resample step, because if we let it continue to automatically subtract the "sky" values computed by skymatch, it will be subtracting only the deltas - which you want, in order to get a flat mosaic - but not any absolute/global sky signal. Is that correct?

We could potentially also add a new parameter to the resample step to allow the sky subtraction to be toggled on/off, but if the above scenario works as desired, it sounds like the default should be subtract sky (as it hardwired to do now).

stscijgbot-jp commented 2 years ago

Comment by Karl Gordon on JIRA:

We are planning on discussing this topic based on Mattia Libralato's and Misty Cracraft's testing at the STScI MiRI imaging subgroup meeting this afternoon. I'll post an update afterwards.

stscijgbot-jp commented 2 years ago

Comment by Karl Gordon on JIRA:

We discussed this topic at length at our meeting today and request the default parameters to be:

skymethod="match", match_down="True", subtract="False"

Just to be explicit as possible. I think it is the same as you state above Howard Bushouse, but am not 100% sure.

stscijgbot-jp commented 2 years ago

Comment by Mihai Cara on JIRA:

I did my best to understand the issue here (like what is broken?) but I am confused. I did not participate (at least not in any major way) in the development of current implementation of CR-rejection and resample steps and so I will speak from my drizzlepac experience and what I believe should be happening (or how things should be working).

Both CR-rejection and resample need images to be matched (for real) - not just recorded values. The "deltas" in sky backgrounds should be subtracted from the images when detecting CRs or combining images together. If it is not done during CR detection step - then some images may have most pixels flagged as CR just because they are offset compared to the median image. For resample, this matching is not critical - it will simply co-add all images together with corresponding weights and so users will get "something". For best results images should be matched.

"global" was introduced to produce results similar to old "localmin" method used in the drizzlepac. I agree with everyone else that it should not be the default value but not because there is a problem with CR or resample - "global" should not matter to these two steps. [This is something that I do not understand... Why does it matter with regard to the end result? It still should be correct. Is there something broken?]

NOTE: skymatch has an option to "subtract" background from input images instead of just recording computed value. This is just an option but I do not recommend using it. This option was introduced to match capabilities of the drizzlepac which can accept user-subtracted images. This does not contradict first point above that says images must be matched "for real".

CR-rejection algorithm should have access to the real pixel values in order to estimate variance so that it can decide whether pixel value departure from the median image is within an acceptable range or not (CR). We could subtract background from the images but then CR-rejection step should add it back to estimate these errors (but it should be using background-subtracted images when building the median image). I do not know specifics of how this was implemented in the CR step and so "subtract=True" may not even be a viable option if CR-step cannot handle this situation correctly.

Presence of "global+" should not have any effect if CR-rejection is implemented correctly i.e., it is added back for variance estimation when subtract=False or it is subtracted from the images for median image creation or resample steps if subtract=False. I discussed this with James Davies [X] but I do not know what was decided in the end.

In the end, I agree with proposed by Karl Gordon default parameters:

??skymethod="match", match_down="True", subtract="False"??

except I would have chosen match_down=False to be more conservative when estimating errors from pixel values (i.e., err ~= sqrt(I)).

In the end, I do not understand what this ticket is about: is it about preferences about having sky subtracted from the final resampled image or is there a bug/something wrong in the current code? If it is the later, then switching to "match" (no "global") may mask a deeper problem in the steps but is is the wrong kind of fix.

stscijgbot-jp commented 2 years ago

Comment by Tyler Pauly on JIRA:

Not to distract from Mihai's comments, but the requested changes result in differences to final products in the regression tests, e.g. test_nircam_image_stage3:

AssertionError: 
   fitsdiff: 5.0
   a: /data1/jenkins/workspace/RT/JWST-Developers-Pull-Requests/clone/test_outputs/popen-gw14/test_nircam_image_run_detector1pipeline0/jw42424-o002_t001_nircam_clear-f444w_i2d.fits
   b: /data1/jenkins/workspace/RT/JWST-Developers-Pull-Requests/clone/test_outputs/popen-gw14/test_nircam_image_run_detector1pipeline0/truth/jw42424-o002_t001_nircam_clear-f444w_i2d.fits

Primary HDU:

     Headers contain differences:
       Keyword BKGLEVEL has different values:
          a> 0.0
          b> 0.1508732994059528

  Extension HDU 1 (SCI, 1):

     Data contains differences:
       Data differs at [1565, 4]:
          a> 0.15663332
          b> 0.005760026
       Data differs at [1566, 4]:
          a> 0.12281047
          b> -0.028062837
       Data differs at [1567, 4]:
[...]
  Extension HDU 8 (HDRTAB, 1):

     Data contains differences:
       Column BKGLEVEL data differs in row 0:
          a> 0.0
          b> 0.15087329940595284
       Column BKGLEVEL data differs in row 1:
          a> 0.0
          b> 0.15087329940595284
       Column BKGLEVEL data differs in row 2:
          a> 0.0009531162879104502
          b> 0.1518264156938633
       Column BKGLEVEL data differs in row 3:
          a> 0.0009531162879104502
          b> 0.1518264156938633
       Column BKGLEVEL data differs in row 4:
          a> 0.0012056025542624728
          b> 0.1520789019602153
       Column BKGLEVEL data differs in row 5:
          a> 0.0012056025542624728
          b> 0.1520789019602153

Is this the intended result, that the saved background values are now different?

stscijgbot-jp commented 2 years ago

Comment by Howard Bushouse on JIRA:

Regarding the changes in the regtest results for calwebb_image3, yes we expect differences in the BKGLEVEL keyword values for all images used in the test, as well as differences in the overall signal level in the final combined i2d product. Using the new skymatch parameter value, it will only compute differences in the sky levels between each image. Not sure which image it uses as the reference in that case (first one in the ASN list?), but you would then expect one of the images to have BKGLEVEL=0.0, while the others have some small non-zero values, both positive and negative. Given the fact that the HDRTAB extension in the i2d product gives the same BKGLEVEL for both images (are there just 2 input images used in the test?), suggests that those images are identical copies of one another and hence the delta between them is exactly zero. If that's the case (identical input images), it's not a very robust test.

stscijgbot-jp commented 2 years ago

Comment by Tyler Pauly on JIRA:

I edited the snippet above - in that test, there are six input images. It appears to be three integrations each with an nrca5 and an nrcb5 entry.

stscijgbot-jp commented 2 years ago

Comment by Howard Bushouse on JIRA:

Yes, those entries for the new BKGLEVEL values across all 6 images look reasonable. They are 3 pairs of NRCALONG and NRCBLONG exposures, so skymatch groups together the 2 images from modules A and B and derives a common sky value for each pair. That's exactly what we're seeing in the HDRTAB entries: the first pair are now set to zero, and each of the next 2 pairs have common non-zero values.

stscijgbot-jp commented 2 years ago

Comment by Howard Bushouse on JIRA:

Fixed by #6580

stscijgbot-jp commented 2 years ago

Comment by Misty Cracraft on JIRA:

Just adding in a link to the notebook the team used to decide on the method and parameters that Karl stated.

https://jwst-validation-notebooks.stsci.edu/jwst_validation_notebooks/skymatch/jwst_skymatch_miri_test/jwst_skymatch_miri_test.html

 

stscijgbot-jp commented 2 years ago

Comment by Anton Koekemoer on JIRA:

thanks Misty Cracraft  for confirming that these parameters (skymethod="match", match_down="True", subtract="False") produce the behaviour needed to resolve this ticket.