spacetelescope / jwst

Python library for science observations from the James Webb Space Telescope
https://jwst-pipeline.readthedocs.io/en/latest/
Other
558 stars 164 forks source link

Errors in TSO-Spec2 and TSO3 pipelines #4338

Closed stscijgbot closed 4 years ago

stscijgbot commented 4 years ago

Issue JP-1174 was created by Bryan Hilbert:

I have several files containing simulated NIRCam Grism TSO observations. As I have pushed these files through the various stages of the pipeline, I've run into several errors and strange results. I'm going to list everything here, rather than in separate tickets for each. It may be that some of these issues are from problems in the simulated data.  I've placed the data files into /user/hilbert/tso_pipeline_run, including a copy of the notebook I used to run the pipeline. The observation uses the 2048 x 256 subarray on the NIRCam A5 detector. I've included files only for the Grism TSO + F444W filter (the first of my three observations). In order to test the file splitting, my exposure has 70 integrations of 5 groups each, split into 2 segment files.

 

calwebb_detector1: 

Lots of repeated warnings:

2019-12-11 12:59:00,919 - stpipe.RampFitStep - WARNING - /Users/hilbert/miniconda3/envs/mirage/lib/python3.6/site-packages/jwst/ramp_fitting/ramp_fit.py:1505: RuntimeWarning: invalid value encountered in greater g_pix = these_pix[variance[these_pix] > 0.] # good pixels

The output slope image looks reasonable though.

 

calwebb_spec2:

When creating level2 association tables programmatically (using asn_from_list()), the asn_pool keyword is not created. It has to be added manually.

e.g.

asn = asn_from_list(grism_444_rampfitstep_files, rule=DMSLevel2bBase, product_name=prod_name) with open('level2_grism_f444w_asn.json', 'w') as fh:     fh.write(asn.dump()[1])

 

I am getting a warning about unclosed files after the prefetch steps for the reference files and before any processing starts (I think).

2019-12-11 13:07:56,383 - stpipe.Spec2Pipeline - INFO - Prefetch for WAVECORR reference file is 'N/A'. 2019-12-11 13:07:56,385 - stpipe.Spec2Pipeline - INFO - Prefetch for WAVELENGTHRANGE reference file is '/grp/crds/cache/references/jwst/jwst_nircam_wavelengthrange_0002.asdf'. 2019-12-11 13:07:56,440 - stpipe.Spec2Pipeline - INFO - Prefetch for WFSSBKG reference file is 'N/A'. 2019-12-11 13:07:56,442 - stpipe.Spec2Pipeline - INFO - Starting calwebb_spec2 ... 2019-12-11 13:07:56,519 - stpipe.Spec2Pipeline - INFO - Processing product jw88888001001_01101_00002-seg001_nrca5_1 2019-12-11 13:07:56,521 - stpipe.Spec2Pipeline - INFO - Working on input jw88888001001_01101_00002-seg001_nrca5_1_rampfitstep.fits ... 2019-12-11 13:07:57,068 - stpipe.Spec2Pipeline - WARNING - /Users/hilbert/miniconda3/envs/mirage/lib/python3.6/site-packages/jwst/stpipe/step.py:350: ResourceWarning: unclosed file <_io.FileIO name='jw88888001001_01101_00002-seg001_nrca5_1_rampfitstep.fits' mode='rb' closefd=True> gc.collect()

 

In extract1d, I'm getting an error that there is no INT_TIMES extension in the input file, but it is in fact present. For the INT_TIMES extension when dealing with segment files, should the INT_TIMES table in each segment file contain entries only for the integrations in that file? And should the integration_number correspond to the index of the integration in that file, or in the exposure as a whole (e.g. integration_numbers span 1-60 in segment01  file and 61-70 in segment02, or 1-60 in segment01 and 1-10 in segment02)?

2019-12-11 13:19:28,274 - stpipe.Spec2Pipeline.extract_1d - INFO - ... 50 integrations done 2019-12-11 13:20:37,179 - stpipe.Spec2Pipeline.extract_1d - INFO - All 60 integrations done 2019-12-11 13:20:37,181 - stpipe.Spec2Pipeline.extract_1d - WARNING - There is no INT_TIMES table in the input file. 2019-12-11 13:20:37,183 - stpipe.Spec2Pipeline.extract_1d - INFO - Step extract_1d done

Perhaps my table is formatted incorrectly?

 

The calints files output by calwebb_spec2 look bad. The spectrum is centered vertically in the aperture, as expected. The shape of the array looks good as well (2048 x 64), but all pixels in columns 1400-2048 have a value of zero. There is also a large amount of noise in columns 0-150 and 1350-1399.

 

calwebb_tso3:

The extracted1d spectra output in the fits table and the ecsv file contain fluxes that are all NaNs.

I get a warning that overwrite=True is not set for the creation of the ecsv source catalog.

2019-12-11 15:57:31,483 - stpipe.Tso3Pipeline - WARNING - /Users/hilbert/miniconda3/envs/mirage/lib/python3.6/site-packages/astropy/io/ascii/ui.py:749: AstropyDeprecationWarning: grism_tso_f444w_whtlt.ecsv already exists. Automatically overwriting ASCII files is deprecated. Use the argument 'overwrite=True' in the future. output), AstropyDeprecationWarning)

I see a MergeConflictWarning after the white light step finishes:

2019-12-11 15:57:26,991 - stpipe.Tso3Pipeline.white_light - INFO - Step white_light done 2019-12-11 15:57:31,440 - stpipe.Tso3Pipeline - INFO - Saved model in grism_tso_f444w_x1dints.fits

2019-12-11 15:57:31,460 - stpipe.Tso3Pipeline - WARNING - /Users/hilbert/miniconda3/envs/mirage/lib/python3.6/site-packages/astropy/utils/metadata.py:360: MergeConflictWarning: Cannot merge meta key 'number_of_integrations' types <class 'int'> and <class 'int'>, choosing number_of_integrations=10 MergeConflictWarning)

 

stscijgbot commented 4 years ago

Comment by Philip Hodge: Since you know that there is in fact an INT_TIMES extension in the input file, the message from extract_1d that there is no INT_TIMES table implies that the length of the table is zero.  If there are actually rows in the table, please let me take a look at your input file; I would like to check whether extract_1d is determining the length incorrectly.

The values in the integration_number column run from 1 to the total number of integrations over all segments.  So in your case, they should run from (at least) 1 to 60 in the first segment, and 61 to 70 in the second segment.  It is OK for the INT_TIMES table to contain integration numbers that are outside the range for any individual segment; in particular, you could use identical INT_TIMES tables with integration numbers that run from 1 to 70 for both segments.  extract_1d uses keywords INTSTART and INTEND as the range of integration numbers to read from the table.

stscijgbot commented 4 years ago

Comment by Philip Hodge: I can see that the INT_TIMES in your files do in fact contain rows, and I tried opening a couple of them and checking the length the way extract_1d does, i.e. len(input_model.int_times), and it's not zero. So I don't understand why extract_1d is reporting that there isn't an int_times table. Perhaps extract_1d is being run on one of the files that don't contain INT_TIMES, such as the _crfints or _1_calints file. I'll try running the calwebb_spec2 pipeline with a copy of your files, and I'll add some print statements to see what's going on.

Thanks for pointing out these problems.

stscijgbot commented 4 years ago

Comment by Philip Hodge: The problem is that the extract_2d step does not propagate the INT_TIMES extension to the output:

xxx output of extract_2d: filename: jw88888001001_01101_00002-seg001_nrca5_1_rampfitstep.fits date: 2019-12-12T11:11:01.274 model_type: SlitModel

attribute size type

data (2048,64,60) float32 err (2048,64,60) float32 dq (2048,64,60) uint32 wavelength (2048,64) float32

The input to extract_1d is jw88888001001_01101_00002-seg001_nrca5_1_calints.fits (for the first segment), and that does not contain an INT_TIMES extension.

stscijgbot commented 4 years ago

Comment by Howard Bushouse: Regarding the warning messages from Spec2Pipeline about the unclosed input file, when I run from the command line (i.e. strun calwebb_tso-spec2.cfg level2_grism_f444w_asn.json) I don't get that warning, even with debug level messaging turned on: {code:java} ... 2020-01-29 08:40:05,587 - stpipe.Spec2Pipeline - INFO - Prefetch for WAVECORR reference file is 'N/A'. 2020-01-29 08:40:05,587 - stpipe.Spec2Pipeline - INFO - Prefetch for WAVELENGTHRANGE reference file is '/grp/crds/cache/references/jwst/jwst_nircam_wavelengthrange_0002.asdf'. 2020-01-29 08:40:05,589 - stpipe.Spec2Pipeline - INFO - Starting calwebb_spec2 ... 2020-01-29 08:40:05,597 - stpipe.Spec2Pipeline - INFO - Processing product jw88888001001_01101_00002-seg001_nrca5 2020-01-29 08:40:05,597 - stpipe.Spec2Pipeline - INFO - Working on input jw88888001001_01101_00002-seg001_nrca5_rateints.fits ... 2020-01-29 08:40:05,609 - stpipe.Spec2Pipeline - DEBUG - Opening jw88888001001_01101_00002-seg001_nrca5_rateints.fits as <class 'jwst.datamodels.cube.CubeModel'> 2020-01-29 08:40:06,174 - stpipe.Spec2Pipeline.assign_wcs - INFO - Step assign_wcs running with args (<CubeModel(60, 256, 2048) from jw88888001001_01101_00002-seg001_nrca5_rateints.fits>,). 2020-01-29 08:40:06,175 - stpipe.Spec2Pipeline.assign_wcs - INFO - Step assign_wcs parameters are: {'pre_hooks': [], 'post_hooks': [], 'output_file': None, 'output_dir': None, 'output_ext': '.fits', 'output_use_model': False, 'output_use_index': True, 'save_results': False, 'skip': False, 'suffix': None, 'search_output_file': True, 'input_dir': '', 'slit_y_low': -0.55, 'slit_y_high': 0.55} 2020-01-29 08:40:06,748 - stpipe.Spec2Pipeline.assign_wcs - INFO - COMPLETED assign_wcs ...{code} So perhaps this is something that only happens when calling the pipeline module from within python and/or within a notebook?

stscijgbot commented 4 years ago

Comment by Howard Bushouse: Regarding the appearance of the data in the calints images, the reason for everything being zeroed-out in columns 1400-2048 is because the wavelengths of those pixels are beyond the range of the flux calibration data provided in the photom reference file for the GRISMR+F444W combination.  {code:java} In [15]: phot[1].data[9]                                                                                                           Out[15]: ('F444W', 'GRISMR', 1, 840.4664, 8.404664, 2597, array([2.4933348, 2.4943388, 2.4953427, ..., {code} Note that nelem=2597 and phot[1].data['wavelength'][2596]=5.0997186. All the 'relresponse' values above this wavelength are set to zero. And in the science data (calints) the wavelengths of columns 1400 and beyond are >= 5.100561. Hence all of those pixels are zeroed out and their DQ values are set to 1 (do not use).

No immediate ideas as to where the noise is coming from in the other areas of the calints images.

stscijgbot commented 4 years ago

Comment by Bryan Hilbert: I was indeed calling the pipeline using the run method within a notebook when the unclosed input file error showed up. {code:java} asn_level_2 = 'level2_grism_f444w_asn.json' result2 = Spec2Pipeline(config_file='pipeline_cfgs/calwebb_tso-spec2.cfg') result2.save_results = True result2.run(asn_level_2) {code}  

stscijgbot commented 4 years ago

Comment by Bryan Hilbert: I'm confused about the photom results. Comparing the output of the slope fitting step to the calints file, the trace of the source in the rate image extends far beyond where it is cut off in calints. So maybe the calculated wavelengths are incorrect? Or maybe I'm misinterpreting something? Here's an image of the rate file (left) and calints (right) side-by-side. If the wavelengths in the zeroed out portion of the calints file were really > 5.1 microns, then I wouldn't expect the trace to be visible in the rate image, since the throughput of the F444W filter is ~zero there. !rateleft_calintright.jpeg!

stscijgbot commented 4 years ago

Comment by Bryan Hilbert: Ugh, sorry about that image. I thought Jira would do a better job of scaling it to fit on the page.

stscijgbot commented 4 years ago

Comment by Howard Bushouse: I noticed that too - the fact that there's spectral data in the rateints image far beyond the extent of the valid photom wavelength range. I guess it could be 1 of 2 things:

1) the wavelengths calculated for the pixels in the science image are incorrect, due to an error somewhere in the WCS transforms or the reference data that are used to construct the transforms

2) the photom reference data were incorrectly truncated in their wavelength coverage

stscijgbot commented 4 years ago

Comment by Howard Bushouse: FYI, JP-1249 has been filed to correct the problem of the INT_TIMES table not getting copied over in the midst of the processing flow, and JP-1250 has been filed for the error message about overwriting the white_light output file.

stscijgbot commented 4 years ago

Comment by Howard Bushouse: As to the noise near the ends of the calibrated portion of the spectrum, this is almost certainly due to the flux calibration data. The "relresponse" values in the photom ref file are on the order ~1.5 for most of the (useful) wavelength range, but near the ends of the wavelength range the calibration values increase to the order of a few hundred (response is dropping off, so calibration goes up). So the noise already present in the image data is just being amplified by the large flux calibration values that are multiplied into the data. See the attached plot of the flux calibration vector.

stscijgbot commented 4 years ago

Comment by Bryan Hilbert: Makes sense. So what's the best way to distinguish between your options 1 and 2 above? Make sure that the relresponse curve in the photom ref file looks like the ~inverse of the F444W throughput curve? As throughput goes to zero, relresponse should go to infinity. I also have the input spectrum used to create the science data. I could plot that against a rough signal vs wavelength from the calints file. But I think we already know that the input and output spectra will be shifted in wavelength relative to one another. I'm not sure if that really helps identify where the problem is coming in.

stscijgbot commented 4 years ago

Comment by Howard Bushouse: Regarding the question about partitioning the data in the INT_TIMES table for the different segments, the INT_TIMES table in each segment only needs to contains entries for the data in that segment. So in your example, the INT_TIMES table in seg001 would contain 60 entries, with integration numbers running from 1-60, and the table in seg002 would contain just 10 entries, with integration numbers running from 61-70. The code in steps like tso_photometry and white_light is going to be using the absolute integration number within the entire exposure, so that's why the entries in seg002 need to start at 61 (instead of restarting at 1).

For example, one of our NRC_TSIMAGE regression tests uses 2 segments, each of which contain 4 integrations (just a small number, so the test doesn't take too long). The contents of each INT_TIMES table is: {code:java} In [4]: seg1['int_times'].data Out[4]: FITS_rec([(1, 58016.76934771, 58016.76939342, 58016.76943914, 0., 0., 0.),           (2, 58016.76943914, 58016.76948486, 58016.76953058, 0., 0., 0.),           (3, 58016.76953058, 58016.7695763 , 58016.76962202, 0., 0., 0.),           (4, 58016.76962202, 58016.76966774, 58016.76971346, 0., 0., 0.)],          dtype=(numpy.record, [('integration_number', '>i4'), ('int_start_MJD_UTC', '>f8'), ('int_mid_MJD_UTC', '>f8'), ('int_end_MJD_UTC', '>f8'), ('int_start_BJD_TDB', '>f8'), ('int_mid_BJD_TDB', '>f8'), ('int_end_BJD_TDB', '>f8')]))

In [5]: seg2['int_times'].data Out[5]: FITS_rec([(5, 58016.76971346, 58016.76975917, 58016.76980489, 0., 0., 0.),           (6, 58016.76980489, 58016.76985061, 58016.76989633, 0., 0., 0.),           (7, 58016.76989633, 58016.76994205, 58016.76998777, 0., 0., 0.),           (8, 58016.76998777, 58016.77003349, 58016.77007921, 0., 0., 0.)],          dtype=(numpy.record, [('integration_number', '>i4'), ('int_start_MJD_UTC', '>f8'), ('int_mid_MJD_UTC', '>f8'), ('int_end_MJD_UTC', '>f8'), ('int_start_BJD_TDB', '>f8'), ('int_mid_BJD_TDB', '>f8'), ('int_end_BJD_TDB', '>f8')])) {code} Note how the integration_number values go from 5-8 in the 2nd table, and first int_start value in the 2nd table picks up where the first table int_end value left off.

stscijgbot commented 4 years ago

Comment by Howard Bushouse: Here's the extracted calibrated spectrum from one of the integrations in the seg001 x1dints product. The jump around 4.2 microns is due to offsets in the data across amplifier boundaries. I'm wondering if that's due to the fact that I ran the data through all steps of calwebb_tso1 and I shouldn't have (e.g. I subtracted a dark, when perhaps you don't have dark current included in the simulation). Anyway, do any of the features, such as the increasing flux around 4.6 microns, show up at the wavelengths you'd expect based on the input spectrum to the simulation? If everything looks like it's off in wavelength space, then perhaps there is an issue with the wavelength assignments that are being made in the calwebb_spec2 pipeline.

 

!nircam_grism_x1d.png|thumbnail!

stscijgbot commented 4 years ago

Comment by Bryan Hilbert: Yikes. Dark current is included in the simulation. I have a feeling that the jump at 4.2 microns may be due to the gain file used, in which the gain is calculated in a checkerboard of 128x128 pixel squares. I recently updated the simulator to use mean gain values in order to avoid those discontinuities. Ignoring that though, the ramp up at 4.6 microns makes no sense to me. Here are two plots: 1) the input spectrum in Jy, straight from the stellar model (no instrumental effects) 2) the input spectrum multiplied by the system throughput, in Jy. There isn't much in the way of features to check for, unfortunately. 

Maybe I should create some new data with some very narrow, deep absorption lines?

 

!input_jy.jpeg|width=325,height=195!!input_times_system_throughput_jy.jpeg|width=311,height=192!

stscijgbot commented 4 years ago

Comment by Howard Bushouse: If I click through the various integrations in both the rateints and calints files displayed in ds9, the offsets between amplifiers jump around a lot. If it were offsets being imposed by anything like the dark or flat or linearity or gain correction, they should be consistent from integration to integration (because the same calibration gets applied to every integration). So that seems to suggest that the offsets are either there already in the uncal data (haven't looked at those) or there's something about the pixel ramps in the different amplifier regions that changes from integration to integration, causing weird behavior in the ramp fitting.

stscijgbot commented 4 years ago

Comment by Howard Bushouse: The fact that your original simulated spectrum covers the wavelength range 3.8 - 5.0 microns, while the calibrated pipeline data have the spectral trace extending well beyond what it claims is 5.0 microns does suggest that the wavelength assignments in the pipeline are not correct. It's assigning the right range, but they're not landing on the right pixels.

stscijgbot commented 4 years ago

Comment by Bryan Hilbert: I cropped those plots to be the same wavelength range as yours. They do extend shortward of 3.8 and longward of 5.0, but after multiplying by the system throughput, the flux levels are very low out there.

!input_jy_3to6.jpeg|width=372,height=232!!input_times_system_throughput_jy_3to6.jpeg|width=342,height=202!

stscijgbot commented 4 years ago

Comment by Bryan Hilbert: As for the amps jumping around, these data are in the 2048x256 tso aperture. So the reference pixels from the top of the detector are not present. In my experiments a few weeks ago, I found that halving the number of reference pixels used in the refpix correction step did lead to a marked increase in the amp-to-amp and group-to-group level differences. So maybe that's what you're seeing as well?

stscijgbot commented 4 years ago

Comment by Howard Bushouse: Ah, the refpix step, of course! (smacks forehead ...). The one step that is both amplifier- and integration-dependent.  Unfortunately, the results of a test I just ran with the refpix step skipped in calwebb_tso1 STILL show the amp regions jumping around from integration to integration in the rateints file.  Hmmmm .... now what.

Do you build "bias" drifts into the simulations?

stscijgbot commented 4 years ago

Comment by Bryan Hilbert: This is a symptom of a shortcut in Mirage. The simulations are based on darks. We don't have much in the way of real subarray darks yet, so for subarrays, Mirage extracts the appropriate subarray from a full frame dark. But full frame darks are taken using 4 amplifiers, while (most) subarray data are taken with a single amp. So in the real world, in the case where these data are taken with 1 amp, there won't be any amp boundaries. But if someone were to use this subarray and read out using all 4 amps, then I think the data should look reasonably close to this. Mirage doesn't do any modification of the bias. It uses whatever is in the dark current data.

 

stscijgbot commented 4 years ago

Comment by Bryan Hilbert: The bias level does definitely jump around from integration to integration in real data.

stscijgbot commented 4 years ago

Comment by Bryan Hilbert: Repeated the exercise using the grism+F322W2. This case looks more reasonable. The signal in the calints file is cut off at about where I'd expect given the throughput curve of F322W2. The extracted spectra look better than the F444W case. Surface brightnesses are >0. But I still don't understand the features I'm seeing in there.

!extracted_spectra_before_transit.png|width=415,height=311!!f322w2.jpeg|width=956,height=144!  

stscijgbot commented 4 years ago

Comment by Bryan Hilbert: I'm confused about the photom results. Comparing the output of the slope fitting step to the calints file, the trace of the source in the rate image extends far beyond where it is cut off in calints. So maybe the calculated wavelengths are incorrect? Or maybe I'm misinterpreting something? Here's an image of the rate file (left) and calints (right) side-by-side. If the wavelengths in the zeroed out portion of the calints file were really > 5.1 microns, then I wouldn't expect the trace to be visible in the rate image, since the throughput of the F444W filter is ~zero there. !rateleft_calintright.jpeg|width=783,height=264!

stscijgbot commented 4 years ago

Comment by Bryan Hilbert: Ok, here's a bit of a more complete look at the spectrum in the data. The first plot shows the signal from the TSO object (Note that this is plotted versus column number, not wavelength) for an F444W and an F322W2 exposure. There's no dark current/bias/noise in these files. This is the signal that Mirage adds to the dark current exposure.

The next plot shows the spectrum extracted from the uncalibrated slope (where I've manually extracted the signal without doing any background subtraction/averaging/etc). Here we see the effects of the amp-to-amp bias differences. There are jumps in the signal at columns 512, 1024, and 1536. It is extremely noisy, but if you look closely, you can see some of the large features in the noiseless plot are present.

 

The third plot shows my manual spectral extraction using the calints files. The results are pretty similar to those from the rateints file, but with less noise. The largest spectral features from the noiseless spectra are still present. One difference is that since the pipeline zeroed out all columns after ~1450 in the F444W filter, those signals are now zero in the plot.

 

The final plot shows the spectral extraction results from the pipeline. Note that these are plotted vs wavelength, so things are a little squished relative to the other figures which are plotted vs column number. I'm still not entirely sure I trust the wavelengths coming from the pipeline though, given that there is signal in the noiseless input image and the rate image in columns beyond 1450 (even if my noisy manual extraction makes it look like there isn't). See the actual F444W image that I attached earlier, which shows visible signal extending all the way to column 2048. Also note [Figure 10 on the WFSS FOV page|[https://jwst-docs.stsci.edu/near-infrared-camera/nircam-predicted-performance/nircam-wfss-field-of-view#NIRCamWFSSFieldofView-F444W]]. The reference point for this subarray when using F444W is in column 935 or so (just to the left of center). Given that, this figure implies that the spectrum should extend basically to the right edge of the detector, rather than only across the third amp, which is what the pipeline seems to be calculating.

I think the two main takeaways from this are:

The amp-to-amp bias differences make interpretation difficult. Is this what 4-amp data in the GRISM256 aperture will really look like, given that half of the reference pixels will be missing?

While the pipeline-calculated wavelength values look correct for F322W2, I'm not convinced that they are correct for F444W, although maybe I'm misinterpreting something.

 

!manually_extracted_spectra_before_transit_from_seed_image.png|width=357,height=268!!manually_extracted_spectra_before_transit_from_rate_image.png|width=346,height=259!!manually_extracted_spectra_before_transit.png|width=346,height=259!!extracted_spectra_before_transit.png|width=336,height=252!

stscijgbot commented 4 years ago

Comment by Howard Bushouse: In JP-373 [~sosey] also reported what appear to be discrepancies in simulated NIRCam grism data for the F444W filter (note that many places in the text of the ticket say "F444M", but it's actually "F444W"). So I wonder if this is somehow related. I don't know the origin of the simulated data referred to in JP-373, whether it was personal simulations by [~npirzkal] or "official" output from Mirage.

stscijgbot commented 4 years ago

Comment by Megan Sosey: I believe it was an output image from mirage that bryan had given me for testing nircam. Our conjecture back then was that the software was not updating the filter information.

Here's the related notebook with the analysis and images that were used:

[https://github.com/sosey/jwst-investigate/blob/master/nircam/NIRCAM-Object-finding-and-Extract-2D-Filter-update.ipynb]

stscijgbot commented 4 years ago

Comment by Bryan Hilbert: Ok, here's a bit of a more complete look at the spectrum in the data. The first plot shows the signal from the TSO object (Note that this is plotted versus column number, not wavelength) for an F444W and an F322W2 exposure. There's no dark current/bias/noise in these files. This is the signal that Mirage adds to the dark current exposure.

The next plot shows the spectrum extracted from the uncalibrated slope (where I've manually extracted the signal without doing any background subtraction/averaging/etc). Here we see the effects of the amp-to-amp bias differences. There are jumps in the signal at columns 512, 1024, and 1536. It is extremely noisy, but if you look closely, you can see some of the large features in the noiseless plot are present.

 

The third plot shows my manual spectral extraction using the calints files. The results are pretty similar to those from the rateints file, but with less noise. The largest spectral features from the noiseless spectra are still present. One difference is that since the pipeline zeroed out all columns after ~1450 in the F444W filter, those signals are now zero in the plot.

 

The final plot shows the spectral extraction results from the pipeline. Note that these are plotted vs wavelength, so things are a little squished relative to the other figures which are plotted vs column number. I'm still not entirely sure I trust the wavelengths coming from the pipeline though, given that there is signal in the noiseless input image and the rate image in columns beyond 1450 (even if my noisy manual extraction makes it look like there isn't). See the actual F444W image that I attached earlier, which shows visible signal extending all the way to column 2048. Also note [Figure 10 on the WFSS FOV JDox page|[https://jwst-docs.stsci.edu/near-infrared-camera/nircam-predicted-performance/nircam-wfss-field-of-view#NIRCamWFSSFieldofView-F444W]] The reference point for this subarray when using F444W is in column 935 or so (just to the left of center). Given that, this figure implies that the spectrum should extend basically to the right edge of the detector, rather than only across the third amp, which is what the pipeline seems to be calculating.

I think the two main takeaways from this are:

The amp-to-amp bias differences make interpretation difficult. Is this what 4-amp data in the GRISM256 aperture will really look like, given that half of the reference pixels will be missing?

While the pipeline-calculated wavelength values look correct for F322W2, I'm not convinced that they are correct for F444W, although maybe I'm misinterpreting something.

 

!manually_extracted_spectra_before_transit_from_seed_image.png|width=357,height=268!!manually_extracted_spectra_before_transit_from_rate_image.png|width=346,height=259!!manually_extracted_spectra_before_transit.png|width=346,height=259!!extracted_spectra_before_transit.png|width=336,height=252!

stscijgbot commented 4 years ago

Comment by Howard Bushouse: With merge of [https://github.com/spacetelescope/jwst/pull/4573] (see also JP-1264), we believe all of the original issues reported here - at least those can be directly attributable to errors in the pipeline code - have been resolved.

stscijgbot commented 4 years ago

Comment by Howard Bushouse: Assigning back to [~hilbert] for INS testing.

stscijgbot commented 4 years ago

Comment by Bryan Hilbert: Repeating my tests with the fixed version of the pipeline, the "unclosed file" statements are still present. I see them when running calwebb_detector1 and spec2, just after the dq_init, at the beginning of spec2, and just after assign_wcs. {noformat} 2020-02-18 13:29:02,033 - stpipe.Detector1Pipeline.dq_init - INFO - Step dq_init done 2020-02-18 13:29:02,112 - stpipe.Detector1Pipeline - WARNING - /Users/hilbert/miniconda3/envs/jwst_b7.4_TSO_FIXES/lib/python3.7/site-packages/jwst/stpipe/step.py:343: ResourceWarning: unclosed file <_io.FileIO name='grism_TSO/data/raw/jw88888001001_01101_00002-seg002_nrca5_uncal.fits' mode='rb' closefd=True> gc.collect(){noformat}  

  {noformat} 2020-02-18 14:42:01,341 - stpipe.Spec2Pipeline - INFO - Starting calwebb_spec2 ... 2020-02-18 14:42:01,373 - stpipe.Spec2Pipeline - INFO - Processing product /ifs/jwst/wit/witserv/data7/nrc/SSB_build_7.4_testing/TSO/grism_TSO/data/level2a_with_fixes/jw88888002001_01101_00002-seg001_nrca5 2020-02-18 14:42:01,375 - stpipe.Spec2Pipeline - INFO - Working on input /ifs/jwst/wit/witserv/data7/nrc/SSB_build_7.4_testing/TSO/grism_TSO/data/level2a_with_fixes/jw88888002001_01101_00002-seg001_nrca5_rateints.fits ... 2020-02-18 14:42:02,554 - stpipe.Spec2Pipeline - WARNING - /Users/hilbert/miniconda3/envs/jwst_b7.4_TSO_FIXES/lib/python3.7/site-packages/jwst/stpipe/step.py:343: ResourceWarning: unclosed file <_io.FileIO name='/ifs/jwst/wit/witserv/data7/nrc/SSB_build_7.4_testing/TSO/grism_TSO/data/level2a_with_fixes/jw88888002001_01101_00002-seg001_nrca5_rateints.fits' mode='rb' closefd=True> gc.collect() {noformat}   {noformat} 2020-02-18 15:00:42,813 - stpipe.Spec2Pipeline.assign_wcs - INFO - Step assign_wcs running with args (<CubeModel(10, 256, 2048) from jw88888002001_01101_00002-seg002_nrca5_rateints.fits>,). 2020-02-18 15:00:42,816 - stpipe.Spec2Pipeline.assign_wcs - INFO - Step assign_wcs parameters are: {'pre_hooks': [], 'post_hooks': [], 'output_file': None, 'output_dir': None, 'output_ext': '.fits', 'output_use_model': False, 'output_use_index': True, 'save_results': False, 'skip': False, 'suffix': 'assign_wcs', 'search_output_file': True, 'input_dir': 'grism_TSO/data/asn_files_with_fixes', 'slit_y_low': -0.55, 'slit_y_high': 0.55} 2020-02-18 15:00:43,759 - stpipe.Spec2Pipeline.assign_wcs - INFO - COMPLETED assign_wcs 2020-02-18 15:00:43,773 - stpipe.Spec2Pipeline.assign_wcs - INFO - Step assign_wcs done 2020-02-18 15:00:43,852 - stpipe.Spec2Pipeline - WARNING - /Users/hilbert/miniconda3/envs/jwst_b7.4_TSO_FIXES/lib/python3.7/site-packages/jwst/stpipe/step.py:343: ResourceWarning: unclosed file <_io.FileIO name='/ifs/jwst/wit/witserv/data7/nrc/SSB_build_7.4_testing/TSO/grism_TSO/data/level2a_with_fixes/jw88888002001_01101_00002-seg002_nrca5_rateints.fits' mode='rb' closefd=True> gc.collect(){noformat}  

 

 

 

stscijgbot commented 4 years ago

Comment by Bryan Hilbert: When the white light values are calculated, are signals being summed over the entire extraction box? The white light values that I'm seeing tend to jump between 6e7 and -4e7. My guess is that the pixels near the ends of the valid wavelength range are to blame (they all have values that are ~1e7 or ~-1e7), I'm assuming because the flux calibration out at the edges of the bandpass are very large.  !white_light_vs_time_f444w.png!

stscijgbot commented 4 years ago

Comment by Howard Bushouse: The whtlt values are computed from the x1d spectra and just do a dumb/simple summation of flux over all wavelengths, for each integration. So yes, it could be compromised by bad values near the ends of the wavelength range or any bad values in between.

stscijgbot commented 4 years ago

Comment by Howard Bushouse: JP-1342 has been filed for the unclosed file errors.

stscijgbot commented 4 years ago

Comment by Bryan Hilbert: Other than the unclosed file errors (now in JP-1342), my testing with an updated version of the pipeline shows that all issues in this ticket have been resolved. 

For the white light calculation, the plot above seems to argue that the NIRCam team should cut down the range of valid wavelengths in the photom reference file. [~bushouse] do you agree? 

stscijgbot commented 4 years ago

Comment by Howard Bushouse: I agree with trimming the range of valid wavelengths.

Since the last remaining issue is documented elsewhere (JP-1342), I think this ticket can probably be closed (if you agree with that).

stscijgbot commented 4 years ago

Comment by Bryan Hilbert: Works for me!

stscijgbot commented 4 years ago

Comment by Kevin Volk: The white light issue here probably will also apply to NIRISS SOSS mode. A better way to do it would be to sum up the raw signal in ADU/second before conversion to flux density units, as was originally envisioned for the pipeline. However, if that is not practical we can limit the wavelength range to values where the conversion from signal to flux density units (not surface brightness in the SOSS case, unlike the other spectral modes) is within some range that is not too much larger than the minimum value. On-sky we will not be able to push the photometric calibration way off into the wings of the response. Such values are only in the photometric calibration reference files currently because all the values are generated from simulations and have effectively infinite S/N in the calculation.