Closed bourque closed 6 years ago
how flexible is this? the JWST keywords, header info, filetypes, etc. are still in flux (nowhere near as stable as WFC3), so we need to be able to evolve the schema in response to these changes.
If we build this right, changes to the schema for the header tables should be as simple as updating a text file and adding/removing columns in the database. Changes to the data structure itself (i.e. new filetypes, new/different FITS extensions) would be a bit trickier because that would mean adding new tables and not just new columns.
This brings up another question: How often should we anticipate changes to the header keywords/filetypes/FITS extensions after launch?
My guess is that header keyword changes after launch won't be too common, but I'm sure it will happen from time to time.
For what it's worth, I have a function that returns all of the header keywords for a requested reference file type. It does this by reading in the appropriate schema definition files that SSB has in the JWST Calibration Pipeline repo. I doubt it would be hard to update it to work on the data filetypes.
Filetypes that will be ingested into MAST:
_uncal.fits (raw)
_rate.fits, _rateints.fits (countrate images, level-2a)
_cal.fits, _calints.fits (flux calibrated, full WCS-added countrate images, level-2b)
_i2d.fits, _s2d.fits, _s3d.fits (resampled, both for individual exposures and combined)
_x1d.fits (extracted spectra, both for individual exposures and combined)
I'll put together more details on each soon.
frame
= one readout of the detectorgroup
= made from single frame or (onboard) average of multiple framesintegration
= multiple groups, with detector resets before and after (equivalent to single HST file). exposure
= multiple nominally-identical integrations packaged into the same file (like packing multiple HST raw ramps into a single file).No. Name Ver Type Cards Dimensions Format
0 PRIMARY 1 PrimaryHDU 89 ()
1 SCI 1 ImageHDU 25 (2048, 2048, 10, 1) int16 (rescales to uint16)
2 ZEROFRAME 1 ImageHDU 11 (2048, 2048, 1) int16 (rescales to uint16)
3 GROUP 1 BinTableHDU 35 10R x 13C [I, I, I, J, I, 26A, I, I, I, I, 36A, D, D]
SCI
extension contains the detector data. 4 dimensions (detector y, detector x, groups per integration, integrations) ZEROFRAME
extension contains the 0th frame that goes with each integration. For some readout patterns, each group will be the average of N frames. This averaging is done on board JWST. The 0th frame is saved to this separate extension for cases where the initial read is needed for slope fitting. 3 dimensions (detector y, detector x, integrations)GROUP
extension is a binary table that contains detailed timing information about the exposure. The table contains 13 columns, and one row for each M milliseconds of the exposure.GROUP
(13 columns x 1 rows):
Col# Name (Units) Format
1 integration_number I
2 group_number I
3 end_day I
4 end_milliseconds J
5 end_submilliseconds I
6 group_end_time 26A
7 number_of_columns I
8 number_of_rows I
9 number_of_gaps I
10 completion_code_numb I
11 completion_code_text 36A
12 bary_end_time (MJD) D
13 helio_end_time (MJD) D
This is the output of the Level 2A pipeline, which includes basic calibrations (superbias subtraction, linearity correction, slope fitting). For an exposure that contains a single integration the *_rate.fits
file contains the slope image created by line-fitting to the groups of the integration. For an exposure that contains multiple integrations, this *_rate.fits
image contains the mean slope image from all integrations. In this case, the pipeline also outputs a *_rateints.fits
file. That file contains the seperate slope images from all of the integrations. Therefore add one dimension to those shown below for extensions 1-5.
No. Name Ver Type Cards Dimensions Format
0 PRIMARY 1 PrimaryHDU 159 ()
1 SCI 1 ImageHDU 29 (2048, 2048) float32
2 ERR 1 ImageHDU 10 (2048, 2048) float32
3 DQ 1 ImageHDU 11 (2048, 2048) int32 (rescales to uint32)
4 VAR_POISSON 1 ImageHDU 9 (2048, 2048) float32
5 VAR_RNOISE 1 ImageHDU 9 (2048, 2048) float32
6 ASDF 1 ImageHDU 7 (3889,) uint8
SCI
extension - slope images. 2-dimensional (detector y, detector x)ERR
extension - errors on the slope values. 2-dimensional (detector y, detector x)DQ
extension - data quality array. 2-dimensional (detector y, detector x)VAR_POISSON
- contribution to the variance on the slopes due to Poisson noise. 2-dimensional (detector y, detector x)VAR_RNOISE
- contribution to the variance on the slopes due to readnoise. 2-dimensional (detector y, detector x)ASDF
- Contains distortion correction model informationOutput from level 2b pipeline. Flux calibration applied, flat field applied, distortion solution added. Similar to the *_rate.fits
and *_rateints.fits
files above, there are *_cal.fits
files (containing the averaged image if more than one integration per exposure, or the single image if a single integration), and a *_calints.fits
file (which contains the individual calibrated image if there are multiple integrations per exposure).
No. Name Ver Type Cards Dimensions Format
0 PRIMARY 1 PrimaryHDU 250 ()
1 SCI 1 ImageHDU 32 (2048, 2048) float32
2 ERR 1 ImageHDU 10 (2048, 2048) float32
3 DQ 1 ImageHDU 11 (2048, 2048) int32 (rescales to uint32)
4 AREA 1 ImageHDU 9 (2048, 2048) float32
5 VAR_POISSON 1 ImageHDU 9 (2048, 2048) float32
6 VAR_RNOISE 1 ImageHDU 9 (2048, 2048) float32
7 ASDF 1 ImageHDU 7 (13515,) uint8
Extensions are the same as in the case of the *_rate.fits
image, plus the AREA
extension, which is a 2D image containing the pixel area map.
Thanks @bhilbert4 this is very helpful!
JDox page on filetypes and formats: https://jwst-docs-stage.stsci.edu/pages/viewpage.action?spaceKey=JDAT&title=File+Naming+Conventions+and+Data+Products
No. Name Ver Type Cards Dimensions Format
0 PRIMARY 1 PrimaryHDU 230 ()
1 SCI 1 ImageHDU 46 (2048, 2048) float32
2 ERR 1 ImageHDU 10 (2048, 2048) float32
3 DQ 1 ImageHDU 11 (2048, 2048) int32 (rescales to uint32)
4 AREA 1 ImageHDU 9 (2048, 2048) float32
5 VAR_POISSON 1 ImageHDU 9 (2048, 2048) float32
6 VAR_RNOISE 1 ImageHDU 9 (2048, 2048) float32
7 ASDF 1 ImageHDU 7 (13749,) uint8
just to confirm, based on Tom Donaldson's confluence page and what @bhilbert4 has said here, if you have (for ex) a _rate.fits and a _uncal.fits that both correspond to the same original image, everything in the *
of the filename is identical?
My understanding is that the rest of the filename will be consistent between the two.
Yes, that's correct
Per @SaraOgaz
Jonathon pointed me to this doc page for the pipeline where there’s a whole section about the associations: https://jwst-pipeline.readthedocs.io/en/latest/jwst/associations/index.html
Now that we have decided to the use MAST api, this is no longer needed.
We need to develop a schema for the
jwql
database. I think a decent starting point is something like the schema I used for ACS Quicklook:In this schema, we have a
master
table that keeps track of each rootname that is in the database and when it was ingested. Thedatasets
table keeps track of which filetypes exist for a given rootname. Then there is a table for eachdetector
/extension
/filetype
combination which is basically a dump of the headers (columns are header keys and values are header values).To construct this for
jwql
, we will need to know the following for each instrument: