spacetelescope / jwst

Python library for science observations from the James Webb Space Telescope
https://jwst-pipeline.readthedocs.io/en/latest/
Other
568 stars 167 forks source link

extract1d tables in ASDF extension #1565

Closed hbushouse closed 6 years ago

hbushouse commented 6 years ago

Jeff Valenti reports:

I've been learning about ASDF in the context of our JWST pipeline products. For DMS 7.1, 
x1d and x1dints files contain are two copies of each extracted spectrum:

There is one EXTRACT1D extension for each spectrum.
In the ASDF extension, there is a separate tree object and corresponding data block 
for each spectrum.

For cases with lots of spectra (e.g., multi-object spectroscopy, time series), this 
doubling of file size can be significant.

There should NOT be a separate copy of each extract1d data table in the ASDF extension. Perhaps this is a bug in datamodels that is accidentally allowing the table to appear in ASDF?

hbushouse commented 6 years ago

More info from Jeff. The contents of an x1d file created using b7.1rc9 in the DMS environment:

x1d$ x1d_info.py jw95065008001_02101_00001_nrs1_x1d.fits

Filename: jw95065008001_02101_00001_nrs1_x1d.fits No. Name Ver Type Cards Dimensions Format 0 PRIMARY 1 PrimaryHDU 352 ()
1 EXTRACT1D 1 BinTableHDU 39 432R x 8C [D, D, D, J, D, D, D, D]
2 EXTRACT1D 2 BinTableHDU 39 426R x 8C [D, D, D, J, D, D, D, D]
3 ASDF 1 ImageHDU 7 (77120,) uint8

ASDF 1.0.0

ASDF_STANDARD 1.1.0

%YAML 1.1 %TAG ! tag:stsci.edu:asdf/ --- !core/asdf-1.0.0 asdf_library: !core/software-1.0.0 {author: Space Telescope Science Institute, homepage: 'http://github.com/spacetelescope/asdf', name: asdf, version: 1.3.2.dev1004} meta: aperture: {name: NRS1_FULL, pps_name: NRS_FULL_MSA} asn: {pool_name: jw95065_20171116T193016_pool, table_name: /ifs/int/jwstb/store/rspencer/tests/SIC-DIL/bstring/JWSTDP-2017_1-171114/2017-11-16-180632/sdp/asn_creation/cal/level2/jw95065-o008_20171116t193016_spec2_002_asn.json} cal_step: {assign_wcs: COMPLETE, barshadow: COMPLETE, dark_sub: COMPLETE, dq_init: COMPLETE, extract_1d: COMPLETE, extract_2d: COMPLETE, flat_field: COMPLETE, gain_scale: SKIPPED, group_scale: SKIPPED, jump: COMPLETE, linearity: COMPLETE, msa_flagging: COMPLETE, pathloss: COMPLETE, photom: COMPLETE, ramp_fit: COMPLETE, refpix: COMPLETE, resample: COMPLETE, saturation: COMPLETE, srctype: COMPLETE, superbias: COMPLETE} calibration_software_revision: c4224ab6 calibration_software_version: 0.7.8rc9 data_processing_software_version: '2017_1' date: '2017-11-17T01:54:17.410' ephemeris: {reference_frame: EME2000, spatial_x: -1471442.22250475, spatial_y: -654210.727162102, spatial_z: -599236.720360642, time: 57409.9791667, type: Predicted, velocity_x: 0.0123408977852992, velocity_y: -0.0485877185535819, velocity_z: -0.0296238439916802} exposure: {comprssd: false, count: 1, data_problem: false, datamode: 51, drop_frames1: 0, drop_frames3: 0, duration: 139.57788, end_time: 57409.98008542824, exposure_time: 128.842, frame_divisor: 4, frame_time: 10.73676, group_time: 42.94704000000001, groupgap: 0, imprint: false, integration_time: 128.84112, mid_time: 57409.97927768588, nframes: 4, ngroups: 3, nints: 1, nresets_at_start: 1, nresets_between_ints: 1, nsamples: 1, pointing_sequence: 1, prime_parallel: PRIME, readpatt: NRS, sample_time: 10.0, sca_num: 491, start_time: 57409.97846994352, type: NRS_MSASPEC, zero_frame: false} filename: jw95065008001_02101_00001_nrs1_x1d.fits filetype: uncalibrated guidestar: {gs_dec: 0.0, gs_id: '', gs_mag: 0.0, gs_order: 0, gs_pcs_mode: COARSE, gs_ra: 0.0, gs_start_time: '1999-01-01 00:00:00', gs_stop_time: '1999-01-01 00:00:00', gs_udec: 0.0, gs_umag: 0.0, gs_ura: 0.0, visit_end_time: '2016-01-22 23:42:43.5420000'} hga_move: false instrument: {detector: NRS1, filter: CLEAR, focus_position: 0, grating: PRISM, gwa_pxav: 169.2701563999999, gwa_pyav: 16.80007160000001, gwa_tilt: 658.110352, gwa_xp_v: 169.2930292, gwa_xtilt: 0.3378182350000002, gwa_yp_v: 16.7886352, gwa_ytilt: 0.03353544000000001, lamp_state: 'NULL', msa_configuration_id: 1, msa_metadata_file: jw95065008001_01_msa.fits, msa_metadata_id: 2, msa_state: PRIMARYPARK_CONFIG, name: NIRSPEC, pre_image_id: jw95065001_000_pre-image.fits} model_type: MultiSpecModel observation: {activity_id: '01', bkgdtarg: false, date: '2016-01-22', date_end: '2016-01-22', exposure_number: '1', obs_id: V95065008001P0000000002101, observation_label: PRISM_C0, observation_number: 008, program_number: '95065', sequence_id: '1', template: NIRSpec MultiObject Spectroscopy, time: ' 23:28:59.803', time_end: ' 23:31:19.381', visit_group: '02', visit_id: '95065008001', visit_number: '001'} origin: STSCI photometry: {conversion_megajanskys: 1.0, conversion_microjanskys: 23.50443} prd_software_version: PRDOPSSOC-G-010 program: {category: ENG, continuation_id: 0, pi_name: N/A, science_category: '', sub_category: '', title: DEEP HUDF MSA DIL - modified version of 95065 - version 4} ref_file: area: {name: 'crds://jwst_nirspec_area_0011.fits'} barshadow: {name: 'crds://jwst_nirspec_barshadow_0001.fits'} camera: {name: 'crds://jwst_nirspec_camera_0004.asdf'} collimator: {name: 'crds://jwst_nirspec_collimator_0004.asdf'} crds: {context_used: jwst_0422.pmap, sw_version: 7.1.7} dark: {name: 'crds://jwst_nirspec_dark_0026.fits'} dflat: {name: 'crds://jwst_nirspec_dflat_0001.fits'} disperser: {name: 'crds://jwst_nirspec_disperser_0034.asdf'} distortion: {name: N/A} drizpars: {name: N/A} extract1d: {name: 'crds://jwst_nirspec_extract1d_0003.json'} fflat: {name: 'crds://jwst_nirspec_fflat_0002.fits'} filteroffset: {name: N/A} fore: {name: 'crds://jwst_nirspec_fore_0028.asdf'} fpa: {name: 'crds://jwst_nirspec_fpa_0005.asdf'} gain: {name: 'crds://jwst_nirspec_gain_0019.fits'} ifufore: {name: N/A} ifupost: {name: N/A} ifuslicer: {name: N/A} linearity: {name: 'crds://jwst_nirspec_linearity_0019.fits'} mask: {name: 'crds://jwst_nirspec_mask_0016.fits'} msa: {name: 'crds://jwst_nirspec_msa_0005.asdf'} ote: {name: 'crds://jwst_nirspec_ote_0004.asdf'} pathloss: {name: 'crds://jwst_nirspec_pathloss_0002.fits'} photom: {name: 'crds://jwst_nirspec_photom_0012.fits'} readnoise: {name: 'crds://jwst_nirspec_readnoise_0000.fits'} regions: {name: N/A} saturation: {name: 'crds://jwst_nirspec_saturation_0023.fits'} sflat: {name: 'crds://jwst_nirspec_sflat_0002.fits'} specwcs: {name: N/A} superbias: {name: 'crds://jwst_nirspec_superbias_0087.fits'} v2v3: {name: N/A} wavelengthrange: {name: 'crds://jwst_nirspec_wavelengthrange_0004.asdf'} subarray: {fastaxis: 2, name: FULL, slowaxis: 1, xsize: 2048, xstart: 1, ysize: 2048, ystart: 1} target: {catalog_name: '', dec: -27.79127312000003, dec_uncertainty: 0.1, proper_motion_dec: 0.0, proper_motion_epoch: '2000-01-01 00:00:00.00', proper_motion_ra: 0.0, proposer_dec: -27.79127222222222, proposer_name: TARGET-OBSERVATION-8, proposer_ra: 53.16199124999999, ra: 53.16199112, ra_uncertainty: 0.1, source_type: UNKNOWN, type: FIXED} telescope: JWST time: {barycentric_correction: 269.1244423622265, barycentric_expend: 57409.98320021552, barycentric_expmid: 57409.98239251264, barycentric_expstart: 57409.98158480975, heliocentric_correction: 268.2308034505695, heliocentric_expend: 57409.98318987244, heliocentric_expmid: 57409.98238216958, heliocentric_expstart: 57409.98157446671} time_sys: UTC velocity_aberration: {dec_offset: -1.3638712138515e-07, ra_offset: -1.470816967818e-07} visit: {engineering_quality: SUSPECT, internal_target: false, start_time: '2016-01-22 23:22:32.1500000', status: SUCCESSFUL, too_visit: false, total_exposures: 3, tsovisit: false, type: PRIME_TARGETED_FIXED} spec:

Block 0 header: block indexes: [11588, 37562] magic token: d3424c4b header length: 48 (+ 6 = 54) flags: 0, streamed=False compression: None size allocated, used, data: 25920, 25920, 25920 data checksum: c630b88c98f56249c3851e64e4b1615f Block 1 header: block indexes: [37562, 63176] magic token: d3424c4b header length: 48 (+ 6 = 54) flags: 0, streamed=False compression: None size allocated, used, data: 25560, 25560, 25560 data checksum: 7a75acbe8451eb6357403c8926e3206d

philhodge commented 6 years ago

I saw the column definitions of the extracted tables in the ASDF extension, but it said "array (unloaded) shape: [432]", and I thought "unloaded" meant that the data arrays were not actually loaded into the ASDF extension. I checked with Nadia to make sure, and she said no, "unloaded" meant that the file was opened in memory_map mode, and the data really are there, just not loaded immediately. So you are correct: the tables are in the ASDF extension, and I agree that they shouldn't be, but I don't know how to keep them out. I'll run some tests.

Thanks for pointing this out.

bernie-simon commented 6 years ago

This is a bug in the asdf code. For now the issue has been resolved by pull #1572 in jwst.

drdavella commented 6 years ago

Is this file available somewhere that I can look at it?

bernie-simon commented 6 years ago

The new function is _snip_tables in fits_support.py

def _snip_tables(tree):
    def _snip_node(node, json_id):
        if isinstance(node, np.ndarray):
            dtype = node.dtype
            if hasattr(dtype, 'names'):
                node = None
        return node
    return treeutil.walk_and_modify(tree, _snip_node)

def to_fits(tree, schema, extensions=None):
    hdulist = fits.HDUList()
    hdulist.append(fits.PrimaryHDU())

    _save_from_schema(hdulist, tree, schema)
    _save_extra_fits(hdulist, tree)
    _save_history(hdulist, tree)
    tree = _snip_tables(tree)

    asdf = fits_embed.AsdfInFits(hdulist, tree, extensions=extensions)
    return asdf
hbushouse commented 6 years ago

@drdavella You should be able to find it on central store at /grp/jwst/ins/mary/b7.1rc9_full/jw95065/jw95065008001_02101_00001_nrs1_x1d.fits

drdavella commented 6 years ago

I believe that this will be fixed by https://github.com/spacetelescope/asdf/pull/411. But it will be necessary for someone to confirm it on the JWST side once it's merged.