zarbanoo / dorado

Digitized Observatory Resources for Automated Data Operations (DORADO) is a python package that aims to expand Astropy(and affiliated packages) and to be a simple and common framework for data storage, reduction, and analysis, tailored for life at the Allan I. Carswell Observatory at York university, Toronto, Ontario, Canada
1 stars 0 forks source link

Output file size inflation due to AUX HDUs and Bit-bloat #6

Open Mucephie opened 1 year ago

Mucephie commented 1 year ago

Auxiliary HDUs (MASK, UNCERT) are both un-needed/unwanted at the current time; but are also affected by datatype bit precision inflation/bloat that is a broader issue that has been omnipresent since the DRACO days.

Example

Filename: ./dortest/data/bias/59594_Bias.fits
No.    Name      Ver    Type      Cards   Dimensions   Format
  0  PRIMARY       1 PrimaryHDU      34   (3072, 2047)   int16 (rescales to uint16)   
  1  MASK          1 ImageHDU         8   (3072, 2047)   uint8   
  2  UNCERT        1 ImageHDU         9   (3072, 2047)   float64   
None
['SIMPLE', 'BITPIX', 'NAXIS', 'NAXIS1', 'NAXIS2', 'EXTEND', 'COMMENT', 'COMMENT', 'FOCALLEN', 'SBUUID', 'EXPTIME', 'SWCREATE', 'COLORCCD', 'DISPINCR', 'PICTTYPE', 'IMAGETYP', 'XORGSUBF', 'YORGSUBF', 'XBINNING', 'YBINNING', 'EXPSTATE', 'CCD-TEMP', 'INSTRUME', 'XPIXSZ', 'YPIXSZ', 'PEDESTAL', 'FILTER', 'DATE-OBS', 'LOCALTIM', 'STACKED', 'NUMSUBS', 'BUNIT', 'BSCALE', 'BZERO']
['XTENSION', 'BITPIX', 'NAXIS', 'NAXIS1', 'NAXIS2', 'PCOUNT', 'GCOUNT', 'EXTNAME']
['XTENSION', 'BITPIX', 'NAXIS', 'NAXIS1', 'NAXIS2', 'PCOUNT', 'GCOUNT', 'UTYPE', 'EXTNAME']

It is most likely that the bit bloat issue is born out of interactions between astropy and numpy where the astropy data is taking on the default numpy array float precision of np.float64.

The aux HDUs are something that definitely requires more investigation to nail down.

Mucephie commented 1 year ago

One solution I propose is to have a function that "de-bloats" an image before saving. This requires also modifying the BITPIX keyword in the header.