Closed braingram closed 1 year ago
I'm seeing a few failures in the regression tests: https://plwishmaster.stsci.edu:8081/blue/organizations/jenkins/RT%2FJWST-Developers-Pull-Requests/detail/JWST-Developers-Pull-Requests/776/tests
[stable-deps] test_nircam_tsgrism_stage2[calints] – jwst.regtest.test_nircam_tsgrism1m 14s
[stable-deps] test_nircam_tsgrism_stage2[extract_2d] – jwst.regtest.test_nircam_tsgrism1s
[stable-deps] test_nircam_tsgrism_stage2[flat_field] – jwst.regtest.test_nircam_tsgrism2s
[stable-deps] test_nircam_tsgrism_stage2[o012_crfints] – jwst.regtest.test_nircam_tsgrism<1s
[stable-deps] test_nircam_tsgrism_stage2[srctype] – jwst.regtest.test_nircam_tsgrism1s
[stable-deps] test_nircam_tsgrism_stage2[x1dints] – jwst.regtest.test_nircam_tsgrism<1s
[stable-deps] test_nircam_tsgrism_stage3_x1dints – jwst.regtest.test_nircam_tsgrism
@hbushouse @tapastro are these expected failures?
An example error is:
Headers contain differences:
Headers have different number of cards:
a: 228
b: 234
Extra keyword 'R_DFLAT' in b: 'N/A'
Extra keyword 'R_FFLAT' in b: 'N/A'
Extra keyword 'R_SFLAT' in b: 'N/A'
Inconsistent duplicates of keyword '' :
Occurs 41 time(s) in a, 44 times in (b)
Keyword [24] has different values:
a> Nirspec FORE Model reference file information
? ^^^^^^^^ --------
b> DFlat reference file information
? ^ ++
Keyword [25] has different values:
a> Nirspec FPA Model reference file information
? -------- ^^^^^^^
b> FFlat reference file information
? ^ ++
Keyword [26] has different values:
a> Gain reference file information
? ^ ^^
b> SFlat reference file information
? ^^^ ^
Keyword [27] has different values:
a> IFU fore reference file information
? ^ ^^^ ^
b> Nirspec FORE Model reference file information
? ^^^^^^^^ ^^^^^ ^ +
Keyword [28] has different values:
a> IFU post reference file information
b> Nirspec FPA Model reference file information
Keyword [29] has different values:
a> IFU slicer reference file information
? ^^^^^^ ^^^
b> Gain reference file information
? ^^ ^
Keyword [30] has different values:
a> Linearity reference file information
? ^^^ -----
b> IFU fore reference file information
? ^^^^^^^
Keyword [31] has different values:
a> Mask reference file information
? ^^ ^
b> IFU post reference file information
? ^^^^^^ ^
Keyword [32] has different values:
a> Nirspec MSA Model reference file information
b> IFU slicer reference file information
Keyword [33] has different values:
a> Nirspec OTE Model reference file information
b> Linearity reference file information
Keyword [34] has different values:
a> Photometric reference file information
? ^^^^^^^^^^^
b> Mask reference file information
? ^^^^
Keyword [35] has different values:
a> Read noise reference file information
b> Nirspec MSA Model reference file information
Keyword [36] has different values:
a> Regions reference file information
b> Nirspec OTE Model reference file information
Keyword [37] has different values:
a> Saturation reference file information
? ^^ ^ -- ^^
b> Photometric reference file information
? ^^^ ^^^^ ^
Keyword [38] has different values:
a> Spectral distortion reference file information
? ^^ --- -- ^^^^^^^
b> Read noise reference file information
? ^ +++ ^
Keyword [39] has different values:
a> Superbias reference file information
? ^^^ ^^ ^
b> Regions reference file information
? ^ ^ ^^
Keyword [40] has different values:
a> Wavelength Range reference file information
b> Saturation reference file information
Keyword [41] has different values:
a> Calibration step information
b> Spectral distortion reference file information
The regtest errors are expected only in the sense that they've been happening for a while now, but we can't figure out why. But regardless, they have nothing to do with this PR, so you're good.
I think this should address JP-3277 but it probably makes sense to test this PR on one of the failures mentioned on the ticket. However, I don't know how to do that :) I'm happy to learn (if that makes sense) but would need to know how to find the data and run the failing step.
I can give it a try. I'll install this PR version of stdatamodels and test it out. But if you want to try, feel free to install jwst in your environment, and you may be able to follow the repo Wiki for regression testing locally and use the tso regression test data to verify?
Patch coverage: 94.44
% and project coverage change: +0.04
:tada:
Comparison is base (
065ec6e
) 63.90% compared to head (9d30b03
) 63.95%.:exclamation: Current head 9d30b03 differs from pull request most recent head eb516d5. Consider uploading reports for the commit eb516d5 to get more accurate results
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
I ran the test_niriss_soss_stage3_crfints
regtest locally (this isn't a perfect fit for the runs mentioned in the ticket which appear to use NIRSPEC data but this test does run the 'calwebb_tso3' pipeline).
Looking at the data during test with and without the changes in this PR...
Without this PR, jw01091002001_03101_00001-seg001_nis_short_x1dints.fits, contains duplicated data (30 EXTRACT1D and 1 INT_TIMES extensions that are not referenced from the ASDF extension which contains 31 embedded binary blocks).
With this PR the same files contains the same EXTRACT1D and INT_TIMES extensions however these are linked in the ASDF extension (which contains no binary blocks).
@tapastro Have you had a chance to do independent testing of this fix?
I ran a single spec2 association which used to produce a 440MB x1dints file, where the asdf extension was ~223 MB. This code produced a 250MB file with a 32MB asdf extension. Seems like it did a good job! I don't understand how to check if any of the arrays are linked or reproduced in the extension, though.
Thanks for checking! That definitely looks promising.
There's no 'official' way to check the links. What I've been doing is similar to what is in the test added with this PR: https://github.com/spacetelescope/stdatamodels/blob/fbd97cc7ab5a675d36e42db25218f8bffec8f494/tests/test_fits.py#L667-L677
In brief, when stdatamodels links an item in the ASDF tree to a fits extension it changes the data 'source' stored in the ASDF tree to a string like fits:EXTRACT1D,1
. This differs from a non-linked array/table which will have a 'source' that is a number corresponding to the index of the ASDF binary block (in this case embedded in the ASDF extension).
So a normal array will appear something like this in the ASDF yaml.
lookup_table: !core/ndarray-1.0.0
source: 0
datatype: float32
byteorder: big
shape: [2048, 2048]
Whereas a linked array will appear something like:
dq: !core/ndarray-1.0.0
source: fits:DQ,1
datatype: uint32
byteorder: big
shape: [10, 256, 2048]
(note that the above 2 examples were both taken from the same '*_nis_short_calints.fits' file so a file can have both linked and non-linked arrays/tables, in this case the non-linked arrays are related to the wcs).
Let me know if more details would be helpful.
Ah, in that case it's not quite going right - I see 6618 unlinked arrays, and doing a findall on "source:" shows there are 9928 total arrays in the file. I can move this to somewhere public for you to take a look, if you want.
DataModel converts recarrays to FITS_rec on assignment these were not correctly linked to FITS extensions on save resulting in duplication of FITS_rec data on save.
This adds a test for the duplication and some book keeping to allow linking the converted FITS_recs and extensions on save.
Astropy creates a 'view' of the FITS_rec when assigned to a new FITS extension which makes a simple 'is' check complicated (so the strategy used for ndarray and NDArrayType could not be used). Instead, the links between FITS_rec and extensions are tracked (via the FitsContext) which allows creation of the links without having re-find the hdu created from the FITS_rec. (The existing strategy is left in place for ndarray to maintain compatibility with existing ASDF-in-FITS behavior which does not require the same schema structure as the DataModels).
Resolves JP-3277
Regression tests running: https://plwishmaster.stsci.edu:8081/blue/organizations/jenkins/RT%2FJWST-Developers-Pull-Requests/detail/JWST-Developers-Pull-Requests/776/pipeline/202/
Docs failures are unrelated (see: https://github.com/spacetelescope/stdatamodels/pull/180)
Checklist
CHANGES.rst
(either inBug Fixes
orChanges to API
)