spacetelescope / jwst

Python library for science observations from the James Webb Space Telescope
https://jwst-pipeline.readthedocs.io/en/latest/
Other
569 stars 167 forks source link

outlier_detection flagging all non-source pixels #5332

Closed stscijgbot-jp closed 3 years ago

stscijgbot-jp commented 4 years ago

Issue JP-1694 was created on JIRA by Misty Cracraft:

Data are made up of eight simulated images with a starfield of ~50 stars. There are four dither positions and two exposures at each dither. Previously, when this data set was tested through build 7.5, the outlier_detection step was subtracting the sky (background) and leaving the sky averaging 0, with specific background pixels at negative values after the subtraction. All of those negative values were being flagged as outliers. When I re-tested these files through the development version of 7.6 listed above, the outlier detection step is no longer subtracting the background in the output crf files (output of outlier detection). However, now all non-source pixels are being flagged as outliers by the step, which is resulting in a completely invalid combined image.The pixels are being flagged as 17 or 21 which indicates both outlier detected and 'DO_NOT_USE' are being set, which makes combination difficult if all pixels are being rejected.

I used the following parameters in running calwebb_image3.

fwhm=3.762 # Gaussian kernel FWHM of objects expected, default=2.5 minobj=5 # minimum number of objects needed to match positions for a good fit, default=15 snr= 250 # signal to noise threshold, default=5 sigma= 3 # clipping limit, in sigma units, used when performing fit, default=3 fit_geom='shift' # ftype of affine transformation to be considered when fitting catalogs, default='general' use2dhist=False # boolean indicating whether to use 2D histogram to find initial offset, default=True

I have the cal files, .crf files output from outlier detection to show the flagging in the DQ array and the combined image stored in the data location listed above. Only the files pertaining to this test are in that folder.

 

stscijgbot-jp commented 4 years ago

Comment by Misty Cracraft on JIRA:

I have attached a screengrab of the combined image to show the problem.

stscijgbot-jp commented 3 years ago

Comment by Misty Cracraft on JIRA:

As part of re-testing build 7.6, we're running a set of files through the pipeline with all default values as set in the pipeline/config files, etc. and I have completed a test of this data set without tweaking any reference files or default values. That file is labeled starfield_50star4ptdither_76_12_20_combined_i2d.fits and has been placed in the same directory as shown above. This file has the same issues with background pixels getting set to zero in the combined data.

stscijgbot-jp commented 3 years ago

Comment by James Davies on JIRA:

Well this is an interesting bug! Thinking out loud here to get some initial thoughts written down.

Since these are simulated images with "perfect" WCSs, we can completely turn off tweakreg for this dataset. Default param values of tweakreg on this MIRI data causes it to misalign, because the source detection parameters are not optimized for MIRI imaging. So let's just take that out of the mix.

If I turn off tweakreg and outlier_detection, the output resampled image looks good, except for some outliers.

So keeping tweakreg.skip=True, if we just focus on outlier_detection and set save_intermediate_results=True:

$ strun config/calwebb_image3.cfg starfield_50star4ptdither_76_asnfile.json --steps.tweakreg.skip=True --steps.outlier_detection.save_intermediate_results=True

we can see that the single-resampled images *_outlier_i2d.fits that are being used to create the median have non-science data still in them, i.e. the non-imaging area on the left 1/3 of the detector.

!single-resampled.png|thumbnail!

So the stack of those is getting median combined together and that's our reference image for outlier detection. Here's the final median of the stack of all 8 resampled images.

!median.png|thumbnail!

The white area is all NaNs. This is then blotted back to each input frame for comparison. See below for one of the input images (left) with its blotted median (right), not on the same stretch.

!cal_and_blot.png|thumbnail!

Again, the white area area on the median (right side) is NaNs, where there was no image data in the stack, and the input _cal.fits image (left side) still has its background. The median has had the background subtracted off.

More investigation to come.

stscijgbot-jp commented 3 years ago

Comment by James Davies on JIRA:

One further note. We should check whether the pixmap computed from mapping the _cal input WCS to the resampled output WCS has any NaNs. This pixmap is used by the C drizzle code, and it handles NaNs in the pixmap when drizzling in the forward direction. It does not handle NaNs going in the reverse direction, i.e. during the blotting process. Blot is essentially an interpolation, and it doesn't handle NaNs currently. This is in the C code. If there are NaNs it could cause pixels to be dropped in the wrong locations. That said, the above images seem to show that this process is working well. I.e. no pixels look misplaced in the blotted image.

stscijgbot-jp commented 3 years ago

Comment by James Davies on JIRA:

A further thought. The comparison of the input _cal.fits image and its blotted median is pretty easy to do by eye above. The outliers pop out. But the algorithm for doing this I think needs to be revisited.

It was originally designed for STIS CCD by Phil Hodge and Dick Shaw I believe

https://www.stsci.edu/files/live/sites/www/files/home/hst/instrumentation/stis/documentation/instrument-science-reports/_documents/199822.pdf

and it has been modified somewhat for ACS and WFC3 in drizzlepac.drizCR. The version in jwst.outlier_detection is essentially the same as the drizzlepac one, but with CTE removed.

It would be nice to rethink the best way to do this for NIR and mid-IR detectors. I don't think machine learning is necessary, but something that is adaptive, that can take into account the number of stacked images in the median and account for the amount of noise suppression in the blotted image because of this, or preferably an algorithm that doesn't care about the amount of noise suppression in the median image. The amount of noise suppression can vary across the images because of differences in depth (number of overlapping exposures) if there are large dithers. And with multi-read IR detectors, its even more subtle, because the amount of noise in each pixel is loosely-correlated with the exposure time that pixel has seen, as some of the ramp may have been affected by cosmic rays (jumps). So it's not a well-defined problem. And it's not simple.

Of course we could also use something like lacosmic and toss the whole resampling/blotting/median business out the window, and just work directly on the input images, but this also requires tuning.

Anyway, it would nice to get some discussion going on this, because this is not a solved problem. It would be better to come up with an adaptive algorithm in my view than have an inappropriate algorithm controlled by a proliferation of reference files for every possible detector and dithering combination scenario.

stscijgbot-jp commented 3 years ago

Comment by Anton Koekemoer on JIRA:

thanks for the comments. One check I'll ask for, to start:  – can you compare the background level for the blotted image vs the input image? The way this was generally implemented in HST (most recently for ACS and also WFC3, both UVIS and IR) is that the input image retained its background level for this step, and the blotted median was adjusted to that same background level. From the greyscales above it looks like the background levels are different, so this may lead to rejection of basically all pixels since the algorithm carries out a subtraction (input - blotted) as part of determining the SNR cut.

In principle both images could also be background-subtracted (the background level would then still need to be carried through to the noise calculation to be included in the statistics), but the main point is just that they both need to have been treated the same way, either with or without background, so that the difference (input - blotted) would be centered round a value of 0 in empty regions. If this is not currently the case then it can lead to rejection of basically all pixels (except the very brightest sources) – so could you check please what the background levels are in these two images (input and blotted) and let us know here?

There is then a separate discussion that can take place about whether there is an algorithm that is both better, and more robust. The current implementation of this, inherited from the HST side, could be improved, given considerations about correlated noise as well as the possibility of varying depth across the blotted image. I'd say that's a more extended discussion than what's needed to solve this immediate problem, where it would first be useful to look at the background levels of the two images (input and blotted) as a diagnostic, to help fix this specific bug.

 

 

 

stscijgbot-jp commented 3 years ago

Comment by Howard Bushouse on JIRA:

A couple of questions that pop to my mind:

Is any of this somehow connected with the fact that the non-imaging areas of the data are being included in all of the outlier_detection processes, such that the data in those regions is somehow disrupting things?

Related to the above: does the MIRI imaging mode WCS cover the entire detector or just the normal imaging science area, without the areas on the left for the Lyot and 4QPM coronagraphs? If the WCS in those regions is undefined, yet are included in the processing, could that be leading to the kinds of problems James Davies mentioned with drizzling/blotting?

I thought that some time ago we had gone through a process of making sure that the non-imaging portions of the detector were flagged as NON_SCIENCE, so that they don't get included in processes like this. The NON_SCIENCE flags might be contained in the flat-field reference files, if I remember correctly, and hence get propagated into the science data during calwebb_image2 processing. Perhaps we need to re-examine that and if those regions are flagged, maybe they need to be excluded by outlier_detection.

stscijgbot-jp commented 3 years ago

Comment by Howard Bushouse on JIRA:

Also, before we start going down the road of investigating and developing a whole new algorithm, we first need to simply determine why all of the background pixels are being set to zero by the current outlier_detection step. It could be a simple bug somewhere.

stscijgbot-jp commented 3 years ago

Comment by Anton Koekemoer on JIRA:

also again just re-inserting my previous request before it gets lost:

James Davies  you let us know please the background levels in the last two images you show above?  Ie, specifically, the background level in the original exposure, and also the background level in the blotted median image?

My suspicion is there's a net offset difference between the two images, since their backgrounds seem to be at different values in this comparison where you used the same greyscale stretch. In that case, the difference (image - blotted) will have strongly non-zero values cross the entire image, and this could lead to effectively all the pixels being rejected except for those in very bright sources.

If that's the case, then the fix is to make sure that the original exposure and the blotted image have background levels that agree. This is the type of error that might be introduced if there was a change in how the pipeline does background subtraction (which seemed to be the context for this issue).

So can James Davies  let us know please about this question concerning the background values in the images? That would then chart a way forward.

Thanks..

stscijgbot-jp commented 3 years ago

Comment by Jane Morrison on JIRA:

James Davies I am taking a crack at this ticket. The first thing I wanted to check was how the skywatch step is working. Reading the docs the computed background sky is not subtract but the value is stored. I am assuming that outlier detection uses the background sky level and subtracts it from the data before it calculates the median sky value. If save_intermediate =True in outlier detection can I use the *outlier_i2d.fits files to check if the skywatch is working correctly ?

 

stscijgbot-jp commented 3 years ago

Comment by Jane Morrison on JIRA:

James Davies  Howard Bushouse Misty Cracraft Anton Koekemoer I ran cal image 3 on the 8 exposures. I skipped tweak reg and for outlier detection I saved the intermediate files. I have a number of plots to show. To me is seems that skywatch is working fine. The cal images and the outlier_i2d.fits (which I believe subtract the background) all seem reasonable. I have included the background calculated values on plot 1 in the text box.  Even the results from outlier detection seem reasonable. In the second plot I have included the 7 cal planes. As I understand outlier detection the DQ plane is updated with values flagged as outliers. The DQ planes for these 8 images seem reasonable (agreed ? ) It is hard to tell in the DQ planes but the black science region has mostly values of 0, the colored regions are all odd, the corongraphic mask region has values of 0, the other masks have values of 512. 

The last plot is the combined image - it is a mess. I plotted it in a different color to show the bands of zeros. Given the 7 planes in plot 2 this combined image does not look at all correct. Any suggestions on where to start looking. So what part of the masks are supposed to be in the combined data. Only the upper one is - but given the DQ planes I would have thought the other masks would be in the combined image, unless DQ=512 is rejected from combined image

!image-2021-01-05-16-04-44-922.png|width=629,height=292!

!image-2021-01-05-16-05-40-072.png|width=628,height=345!

!image-2021-01-05-16-06-39-778.png|width=562,height=313!

stscijgbot-jp commented 3 years ago

Comment by Misty Cracraft on JIRA:

Just answering the question about the Lyot mask region. In all of the discussions I've had with MIRI people, the response was that they wanted to keep the Lyot region as part of the combined image, as it adds to the fov coverage. However, I'm still not convinced that the combination is happening quite as it should in that region. I'll start looking into that more once this current issue is resolved and I can start dis-entangling issues.

stscijgbot-jp commented 3 years ago

Comment by Jane Morrison on JIRA:

Howard Bushouse pointed out a error I was making in interpolating the results. I should be looking at the crf files.  Looking at the Dq plane of the crf files all the non-source pixels are flagged with DO_NOT_USE. In the picture below all values that are not black have DO_NOT_USE included in the DQ flag. So the error is in outlier detection. So I will follow up on that 

 

!image-2021-01-07-10-45-16-526.png|width=825,height=631!

stscijgbot-jp commented 3 years ago

Comment by Jane Morrison on JIRA:

James Davies Howard Bushouse

I have a question about how the single drizzled products are created. For MIRI imager data we don't want to include the mask on the left (except for the coronographic one) in the single drizzled files. I was looking for a way to exclude pixels that are drizzled based on DQ flag value. I think this is done with GOOD_BITS . It is set to '~DO_NOT_USE+NON_SCIENCE' I think we want to be to ~DO_NOT_USE or ~NON_SCIENCE. So do not resample any pixel with the DO_NOT_USE flagged or NON_SCIENCE flagged.

Does that seem correct to you guys or am I missing something. I think median combine image created from the single images with  NON_SCIENCE include or the region in the between mask and imager region which has a dq value of 262657 (NON_SCIENCE, NO_FLAT, DO_NOT_USE)

So 'good_bits' is an option to outlier_detection, good_bits = string - can it be '~DO_NOT_USE', '~NON_SCIENCE'  I am not sure if it can have multiple values or just a single string value

 

stscijgbot-jp commented 3 years ago

Comment by James Davies on JIRA:

I notice that column 6 of the cal files above shows pretty significant striping in the read noise. And that translates to striping in the error array (column 2) as well.

The originally-reported bug shows striping in these output images.

Is it possible that such read noise is not modeled well by the current algorithm?

stscijgbot-jp commented 3 years ago

Comment by Jane Morrison on JIRA:

James Davies  Howard Bushouse Misty Cracraft I made progress! I put the changes in PR 5601

To ignore the lower mask regions in the left of the detector (so they do not get included in the median image that is used for blotting) we need to set an option to outlier_detecgtion: good_bits = (~DO_NOT_USE+NON_SCIENCE). Good_bits is passed to resample.  I think we want to good_bits defined in a MIRI specific outlier_detection parameter file (the new way). I don't know the other imaging modes for other instruments have non_science pixels and if they want them included or not in the combined data. 

I made two changes to code in outlier_detection.py . The background subtracted value needs to be included in the blot image or the blot image and input science image are way off causing problems with detecting outliers (flagging all non-source data). 

Here are some plots that show how its working. We might want to think about the NAN regions in the blotted images and if they could cause problems down later.

!image-2021-01-08-17-38-07-603.png|width=661,height=347!

!image-2021-01-08-17-38-57-579.png|width=370,height=258!

stscijgbot-jp commented 3 years ago

Comment by Jane Morrison on JIRA:

Ors Detre Karl Gordon Michael Regan

We do not want to include the 4QPM regions when creating a combined data from calimage3. However, these pixels have a DQ flag of NON_SCIENCE which seems to be pulled in from the flat_field step.  Without these pixels also having DO_NOT_USE they are used in outlier detection and resample. Ors Detre do you see any problem with also setting these pixels with DO_NOT_USE - maybe in the flat field step after the flat field has been applied ? 

 

 

stscijgbot-jp commented 3 years ago

Comment by Ors Detre on JIRA:

Hi Jane,

Meaning that the blot files will have Nan for the 4QM pixels? If your question is whether I use those in my CARs when blotting, then I have no problem with that. Or did I misunderstand you?

 

stscijgbot-jp commented 3 years ago

Comment by Jane Morrison on JIRA:

As part of the flat fielding step I would like to change pixels that are just flagged as NON_SCIENCE and flag then as NON_SCIENCE + DO_NOT__USE. This way these pixels are not used in outlier detection and resample. So they not be used in when mosaicking. Do you see any problems doing this ?

 

stscijgbot-jp commented 3 years ago

Comment by Ors Detre on JIRA:

I can see no problem at the moment. What do the 3 different cases mean then? (1. NON_SCIENCE flag alone, 2. DO_NOT_USE alone, 3. DO_NOT_USE + NON_SCIENCE flags together)

stscijgbot-jp commented 3 years ago

Comment by Jane Morrison on JIRA:

Ors Detre The flat field reference files contains dq values of NON_SCIENCE for the pixels in the 4QPM.  For the region between the masks and the imager science region the flat field reference files contains values of NON_SCIENCE + NO_FLAT_FIELD.   After flat fielding the dq flags are updated to contain the dq values in the flat. Because the flat field contains NULL values for the middle region after flat fielding these pixels also flagged as DO_NOT_USE. Pixel  in the 4QPM are just flagged as NON_SCIENCE after flat fielding. The JWST pipeline is set up to only ignore pixels that have DO_NOT_USE.  If we could also flag pixels that have NON_SCIENCE as DO_NOT_USE after flat field (like the middle region) then outlier detection and resample will not use these pixels - which is what we want.  

stscijgbot-jp commented 3 years ago

Comment by Jane Morrison on JIRA:

MIRI Team Misty Cracraft Karl Gordon Michael Regan  Do you agree that as part of flat fielding imager data the pixels define as NON_SCIENCE corresponding to the 4QPM can also have the DQ flag of DO_NOT_USE set ? That will ensure that they are not used in outlier detection or resample for the imager science data

stscijgbot-jp commented 3 years ago

Comment by Misty Cracraft on JIRA:

I'm fine with adding the DO_NOT_USE flag to the 4QPM fields when combining imager data.

stscijgbot-jp commented 3 years ago

Comment by Ors Detre on JIRA:

Thanks Jane Morrison, I think I understand it a lot better now. Then I can say more confidently (in a deeper voice): yes, I have no problem with the proposed way forward.

stscijgbot-jp commented 3 years ago

Comment by Jane Morrison on JIRA:

Karl Gordon Misty Cracraft Howard Bushouse

I realized I do not know how flat field is done if the data is one of the 4QPM ? Is flat fielding done on this type of data. I would not want to set the dq flag to DO_NOT_USE if the flat is being applied the 4QPM. I don't think it is but I want to double check

 

stscijgbot-jp commented 3 years ago

Comment by Ors Detre on JIRA:

The 4QPMs have their own subarray flats and used the same way. I do have full array 4QPM data in my CARs, but they are the exception here.

stscijgbot-jp commented 3 years ago

Comment by Jane Morrison on JIRA:

Ors Detre Misty Cracraft

For now I have added a line to the flat_field step that if a pixel has a DQ value of NON_SCIENCE , DO_NOT_USE is also added to it.

It would be better if the flat field reference file did this. Ors is it possible to add this to updated flat fields ? If so should we open a new ticket to keep track of this ?

 

stscijgbot-jp commented 3 years ago

Comment by Ors Detre on JIRA:

No, I think it will have been done when you finish the new ticket. Just discussing with Misty Cracraft what else to do...

stscijgbot-jp commented 3 years ago

Comment by Ors Detre on JIRA:

Misty Cracraft Jane Morrison

So far I used Nan on the flat image to flag the focal plane mask and unusable pixels. Nans can cause a lot of problems. Should I use DO_NOT_USE (or any other combination of flags) and set the flat to 1 instead?

stscijgbot-jp commented 3 years ago

Comment by Ors Detre on JIRA:

The DQ layers after the requested 4QPMs -> DO_NOT_USE modification:

 

Flat: Pixels with no valid flux are set to Nan. (Dead pixels and values less than 30%) The 4+4 reference columns are also set to Nan.

!miri_0_flat-1.jpg|thumbnail!

 

DQ DO_NOT_USE: The requested update is covering 4QPMs, with some extra coverage below and above 4QPMs subarrays

!miri_1_donotuse.jpg|thumbnail!

 

DQ NON_SCIENCE: Pixels not to be used for imaging

!miri_2_nonscience.jpg|thumbnail!

 

DQ NO_FLAT_FIELD: Pixels where the flat is Nan (therefore this DQ is a bit redundant, contains no extra information)

!miri_6_noflatfield.jpg|thumbnail!

 

All other DQs are 0. Let me know if you need any modification or a better way to flag different conditions.

stscijgbot-jp commented 3 years ago

Comment by Jane Morrison on JIRA:

Just a quick clarification. Comparing the DQ NON_SCIENCE image to the DQ DO_NOT_USE.  Are all the pixels marked as NON_SCIECNE also marked as DO_NOT_USE. In my understanding of those to images it look like there are regions marked as NON_SCIENCE that are not marked as DO_NOT_USE (for example the region between the chronographic mask and imager).

stscijgbot-jp commented 3 years ago

Comment by Howard Bushouse on JIRA:

Fixed by #5601

stscijgbot-jp commented 3 years ago

Comment by Ors Detre on JIRA:

Just learnt something from #5601 , quote: "Theoretically, however, the approach we have taken in most, if not all, pipeline steps with regard to which DQ values to use to signify that a pixel is to be ignored is in fact just the DO_NOT_USE value. All other flag values are deemed informational only. So in this case, if we really want NON_SCIENCE pixels to be excluded from use in any/all steps, those pixels should also have the DO_NOT_USE flag set for them (in addition to NON_SCIENCE)."

Ok, I will use the "DO_NOT_USE" as main DQ info layer. This issue is solved there, but the best if I will do this way nevertheless. I will include the Nan information from the flat.

 

The DO_NOT_USE layer after this mod:

!miri_1_donotuse_v2.jpg!

 

Anything else from me?

stscijgbot-jp commented 3 years ago

Comment by Jane Morrison on JIRA:

Ors Detre Thanks. We don't need anything else.

 

stscijgbot-jp commented 3 years ago

Comment by Ors Detre on JIRA:

Jane Morrison Sure? Putting back odd even row? ;)

stscijgbot-jp commented 3 years ago

Comment by Ors Detre on JIRA:

...and also readout channel 0-3 column correction.

stscijgbot-jp commented 3 years ago

Comment by Misty Cracraft on JIRA:

I have tested this against version 7.7 of the pipeline (0.18.3) with a new flat delivered by Ors, and the problem is no longer seen. I consider this bug fixed.

stscijgbot-jp commented 3 years ago

Comment by Misty Cracraft on JIRA:

Passing tests.