spacetelescope / stdatamodels

https://stdatamodels.readthedocs.io
Other
5 stars 25 forks source link

remove `skip_fits_update` to reduce data duplication and risk of data modification loss #271

Open braingram opened 7 months ago

braingram commented 7 months ago

skip_fits_update determines if, when a file is opened, the fits headers are read to reconstruct the ASDF tree (including extra_fits).

This setting can be False True or None.

When False the ASDF tree (read from the ASDF extension) will be updated (on file read) as follows: 1) fits hdus linked to the tree via the schema will have their data read from the hdu (this always occurs) 2) fits keywords linked to the tree via the schema will be read and the tree updated to reflect the data in the keywords 3) fits hdus not linked to the tree will have their data assigned to the extra_fits portion of the ASDF tree 4) fits keywords not linked to the tree will have their values recorded in extra_fits

When skip_fits_update is True stdatamodels will (attempt to) skip 2 3 and 4 above, only performing 1 above (linking hdu data defined in the schema). However, 2 3 and 4 will still occur if:

When None the value will be read from the SKIP_FITS_UPDATE environment variable (or default to False).

skip_fits_update is unused in jwst as is SKIP_FITS_UPDATE.

Furthermore the above behavior has some issues.

First, the use of extra_fits requires that any data not linked to the schema is duplicated in the fits headers/hdus and the ASDF extension on write. This is required so that on read, if skip_fits_update is True the values in extra_fits will still be readable. This is not a big issue for keywords but the inclusion of an extra hdu (with table or image data) will result in saving that data twice.

Secondly, the computed hash uses only the header values (not the hdu data). This means that a hdu not linked to the schema, if modified outside of stdatamodels will be ignored when skip_fits_update is True (if the hash of the headers still match which seems unlikely but possible).