krakrjak / fits-parse

Parse FITS files for Astronomy data analysis.
BSD 2-Clause "Simplified" License
2 stars 1 forks source link

Support for extensions? Good place to start? #1

Closed seanhess closed 1 year ago

seanhess commented 1 year ago

Hi there! I just started at the National Solar Observatory, and I'm hoping to use Haskell to produce our Level 2 data.

I'm going to need to leverage header extensions. Have you thought about them at all yet? Would you mind letting me know any plans / thoughts / warnings I might need if I were to try to create a PR supporting them?

Below is an example header I'll need to parse.

XTENSION= 'IMAGE   '                                                            
BITPIX  =                  -64                                                  
NAXIS   =                    3                                                  
NAXIS1  =                 2538 / [pix]                                          
NAXIS2  =                  998 / [pix]                                          
NAXIS3  =                    1 / [pix]                                          
PCOUNT  =                    0                                                  
GCOUNT  =                    1                                                  
BUNIT   = 'ct      '                                                            
DATE    = '2023-04-22T04:13:06.007'                                             
DATE-BEG= '2022-06-03T18:16:37.532'                                             
DATE-END= '2022-06-03T18:16:37.798'                                             
TELAPSE =   0.2660001162439585 / [s]                                            
DATE-AVG= '2022-06-03T18:16:37.665000'                                          
ORIGIN  = 'National Solar Observatory'                                          
TELESCOP= 'Daniel K. Inouye Solar Telescope'                                    
OBSRVTRY= 'Haleakala High Altitude Observatory Site'                            
NETWORK = 'NSF-DKIST'                                                           
INSTRUME= 'VISP    '                                                            
OBJECT  = 'unknown '                                                            
CHECKSUM= 'Lr7KMo5KLo5KLo5K'   / HDU checksum updated 2023-04-22T04:13:06       
DATASUM = '365813440'          / data unit checksum updated 2023-04-22T04:13:06 

COMMENT ------------------------------ Telescope -------------------------------
COMMENT  Keys describing the pointing and operation of the telescope. Including 
COMMENT     the FITS WCS keys describing the world coordinates of the array.    
COMMENT ------------------------------------------------------------------------
WCSAXES =                    3                                                  
WCSAXESA=                    3                                                  
WCSNAME = 'Helioprojective-cartesian'                                           
WCSNAMEA= 'Equatorial equinox J2000'                                            
CRPIX1  =   -39.61539960864943 / [pix]                                          
CRPIX2  =                499.0 / [pix]                                          
CRPIX3  =    26.91207176560697 / [pix]                                          
CRPIX1A =    -39.6154348701337 / [pix]                                          
CRPIX2A =                499.0 / [pix]                                          
CRPIX3A =    26.90673366905378 / [pix]                                          
CRVAL1  =              854.231                                                  
CRVAL2  =   -378.0013311660425                                                  
CRVAL3  =   -310.0098168091384                                                  
CRVAL1A =              854.231                                                  
CRVAL2A =    22.25706030412464                                                  
CRVAL3A =    71.53496706827751                                                  
CDELT1  = 0.000999811469978602                                                  
CDELT2  =   0.2134568481952311                                                  
CDELT3  =   0.2134568481952311                                                  
CDELT1A = 0.000999811469978602                                                  
CDELT2A = 5.92935689431197E-05                                                  
CDELT3A = 5.92935689431197E-05                                                  
CUNIT1  = 'nm      '                                                            
CUNIT2  = 'arcsec  '                                                            
CUNIT3  = 'arcsec  '                                                            
CUNIT1A = 'nm      '                                                            
CUNIT2A = 'deg     '                                                            
CUNIT3A = 'deg     '                                                            
CTYPE1  = 'AWAV    '                                                            
CTYPE2  = 'HPLT-TAN'                                                            
CTYPE3  = 'HPLN-TAN'                                                            
CTYPE1A = 'AWAV    '                                                            
CTYPE2A = 'DEC--TAN'                                                            
CTYPE3A = 'RA---TAN'                                                            
PC1_1   =    7.608516681232339                                                  
PC1_2   =                  0.0                                                  
PC1_3   = -0.00037045429729136                                                  
PC2_1   =  0.03360323025167414                                                  
PC2_2   =                  0.0                                                  
PC2_3   =  0.09119668667614665                                                  
PC3_1   =                  0.0                                                  
PC3_2   =    1000.188565571669                                                  
PC3_3   =                  0.0                                                  
PC1_1A  =   -7.377528035147238                                                  
PC1_2A  =                  0.0                                                  
PC1_3A  =  -0.0224468417690155                                                  
PC2_1A  =   -1.863887572989868                                                  
PC2_2A  =                  0.0                                                  
PC2_3A  =  0.08833638700658505                                                  
PC3_1A  =                  0.0                                                  
PC3_2A  =    1000.188565571669                                                  
PC3_3A  =                  0.0                                                  
LONPOLE =                180.0 / [deg]                                          
LONPOLEA=                180.0 / [deg]                                          
TAZIMUTH=    75.45170184868539 / [deg] RawTelescopeAzimuthAngle                 
ELEV_ANG=    32.48282676499853 / [deg] RawTelescopeElevationAngle               
TELTRACK= 'Standard Differential Rotation Tracking' / TelescopeTrackingMode     
TTBLANGL=    198.9964785870236 / [deg] TelescopeCoudeTableAngle                 
TTBLTRCK= 'Fixed coude table angle' / TelescopeCoudeTableTrackingMode           
DATEREF = '2022-06-03T18:16:37.532'                                             
OBSGEO-X=   -5466045.256954942 / [m]                                            
OBSGEO-Y=   -2404388.737412784 / [m]                                            
OBSGEO-Z=    2242133.887690042 / [m]                                            
SPECSYS = 'TOPOCENT'                                                            
VELOSYS =                  0.0                                                  
OBS_VR  =   -110.4626588771872 / [m s-1]                                        
WCSVALID=                    T / WCSValidityIndicator                           

COMMENT ------------------------------ Datacenter ------------------------------
COMMENT      Keys generated by the DKIST data center to describe processing     
COMMENT                 performed, archiving or extra metadata.                 
COMMENT ------------------------------------------------------------------------
DSETID  = 'BQKZZ   '                                                            
POINT_ID= 'BQKZZ   '                                                            
FRAMEVOL=    2.314876556396484 / [Mbyte]                                        
PROCTYPE= 'L1      '                                                            
RRUNID  =                  578                                                  
RECIPEID=                    1                                                  
RINSTID =                  350                                                  
EXTNAME = 'observation'                                                         
SOLARNET=                    1                                                  
OBS_HDU =                    1                                                  
FILENAME= 'VISP_2022_06_03T18_16_37_532_00854231_I_BQKZZ_L1.fits'               
CADENCE =   0.3282829148595638 / [s]                                            
CADMIN  =  0.02399992942810059 / [s]                                            
CADMAX  =    3.071000099182129 / [s]                                            
CADVAR  =    0.834011691783927 / [s]                                            
LEVEL   =                    1                                                  
HEADVERS= '3.5.0   '                                                            
HEAD_URL= 'https://docs.dkist.nso.edu/projects/data-products/en/v3.5.0'         
INFO_URL= 'https://docs.dkist.nso.edu/'                                          
CALVERS = '2.0.1   '                                                            
CAL_URL = 'https://docs.dkist.nso.edu/projects/visp/en/v2.0.1/l0_to_l1_visp.ht&'
CONTINUE  'ml'                                                                  
IDSPARID=                  409                                                  
IDSOBSID=                  444                                                  
IDSCALID=                  434                                                  
WKFLNAME= 'l0_to_l1_visp'                                                       
WKFLVERS= '2.0.1   '                                                            

COMMENT ------------------------------- Dataset --------------------------------
COMMENT     Keys describing the dataset that this FITS file forms a part of.    
COMMENT ------------------------------------------------------------------------
DNAXIS  =                    4                                                  
DNAXIS1 =                 2538 / [pix]                                          
DNAXIS2 =                  998 / [pix]                                          
DNAXIS3 =                  490 / [pix]                                          
DNAXIS4 =                    4 / [pix]                                          
DTYPE1  = 'SPECTRAL'                                                            
DTYPE2  = 'SPATIAL '                                                            
DTYPE3  = 'SPATIAL '                                                            
DTYPE4  = 'STOKES  '                                                            
DPNAME1 = 'dispersion axis'                                                     
DPNAME2 = 'spatial along slit'                                                  
DPNAME3 = 'raster scan step number'                                             
DPNAME4 = 'polarization state'                                                  
DWNAME1 = 'wavelength'                                                          
DWNAME2 = 'helioprojective latitude'                                            
DWNAME3 = 'helioprojective longitude'                                           
DWNAME4 = 'polarization state'                                                  
DUNIT1  = 'nm      '                                                            
DUNIT2  = 'arcsec  '                                                            
DUNIT3  = 'arcsec  '                                                            
DUNIT4  = ''                                                                    
DAAXES  =                    2                                                  
DEAXES  =                    2                                                  
DINDEX3 =                  164 / [pix]                                          
DINDEX4 =                    1 / [pix]                                          
LINEWAV =              854.231                                                  
WAVEBAND= 'Ca II (854.21 nm)'                                                   
WAVEUNIT=                   -9                                                  
WAVEREF = 'Air     '                                                            
WAVEMIN =    854.2706079309165 / [nm]                                           
WAVEMAX =    856.8081294417221 / [nm]                                           

COMMENT ------------------------------ Statistics ------------------------------
COMMENT   Statistical information about the data array contained in this FITS   
COMMENT                                  file.                                  
COMMENT ------------------------------------------------------------------------
DATAMIN =  0.05059385305596335                                                  
DATAMAX =    1.071910415472651                                                  
DATAMEAN=   0.7939938201637414                                                  
DATAMEDN=   0.8445701863059327                                                  
DATARMS =   0.8103401836234486                                                  
DATAKURT=    2.335142626183942                                                  
DATASKEW=   -1.492268209382627                                                  

COMMENT ------------------------------- DKIST ID -------------------------------
COMMENT  Unique identifiers for this FITS file and the observation that created 
COMMENT                                the data.                                
COMMENT ------------------------------------------------------------------------
FILE_ID = '6c2c416f13114ea3bc6c430994c1ffca' / FileID                           
DKISTVER= 'Data Model (SPEC-0122) Revision E' / DKISTFITSHeaderVersion          
OBSPR_ID= 'eid_1_118_opAvoqBr_R002.82591.14499687' / ObservingProgramExecutionID
EXPER_ID= 'eid_1_118'          / ExperimentID                                   
PROP_ID = 'pid_1_118'          / ProposalID                                     
DSP_ID  = 'eid_1_118_opAvoqBr_R002_ipM6wwxZ_dspCtVjmC' / DataSetParametersID    
IP_ID   = 'id.85572.341432'    / InstrumentProgramExecutionID                   
HLSVERS = 'Alakai_5-1'         / DKISTSoftwareVersion                           
NPROPOS =                    1                                                  
PROPID01= 'pid_1_118'                                                           
NEXPERS =                    1                                                  
EXPRID01= 'eid_1_118'                                                           

COMMENT --------------------------- DKIST Operations ---------------------------
COMMENT    Information about this configuration or operations of the facility   
COMMENT                        when generating this data.                       
COMMENT ------------------------------------------------------------------------
OCS_CTRL= 'Auto    '           / OCSControl                                     
FIDO_CFG= 'OUT_C-M1_C-BS555_C-BS950_C-W1_C-W3' / FIDOConfiguration              
DSHEALTH= 'GOOD    '           / DataSourceHealthStatus                         
DSPSREPS=                    1 / DSPSNumberOfRepeats                            
DSPSNUM =                    1 / DSPSRepeatNumber                               
LIGHTLVL=    427.4569600386333 / [adu] LightLevel                               

COMMENT -------------------------------- Camera --------------------------------
COMMENT        Keys describing modes and operation of the camera(s) used.       
COMMENT ------------------------------------------------------------------------
CAM_ID  = '15:VSC-04533'       / CameraUniqueID                                 
CAMERA  = 'AndorZyla.03'       / CameraName                                     
BITDEPTH=                   16 / SensBitsPerPixel                               
XPOSURE =    48.00811267605634 / [ms] FPAExposureTime                           
TEXPOSUR=    4.000676056338028 / [ms] CamExposureTime                           
CAM_FPS =    41.35716748837339 / [Hz] CamFrameRate                              
CHIPDIM1=                 2560 / [pix] ChipDimensionX                           
CHIPDIM2=                 2160 / [pix] ChipDimensionY                           
HWBIN1  =                    1 / [pix] HardwareBinningX                         
HWBIN2  =                    1 / [pix] HardwareBinningY                         
SWBIN1  =                    1 / [pix] SoftwareBinningX                         
SWBIN2  =                    1 / [pix] SoftwareBinningY                         
NSUMEXP =                   12 / NumRawFramesinFPA                              
SWNROI  =                    1 / NumOfSWROI                                     
SWROI1OX=                    0 / [pix] SWROI1OriginX                            
SWROI1OY=                    0 / [pix] SWROI1OriginY                            
SWROI1SX=                 2560 / [pix] SWROI1SizeX                              
SWROI1SY=                 2000 / [pix] SWROI1SizeY                              
HWNROI  =                    2 / NumOfHWROI                                     
HWROI1OX=                    0 / [pix] HWROI1OriginX                            
HWROI1OY=                    0 / [pix] HWROI1OriginY                            
HWROI1SX=                 2560 / [pix] HWROI1SizeX                              
HWROI1SY=                 1000 / [pix] HWROI1SizeY                              
HWROI2OX=                    0 / [pix] HWROI2OriginX                            
HWROI2OY=                 1160 / [pix] HWROI2OriginY                            
HWROI2SX=                 2560 / [pix] HWROI2SizeX                              
HWROI2SY=                 1000 / [pix] HWROI2SizeY                              
NBIN1   =                    1                                                  
NBIN2   =                    1                                                  
NBIN3   =                    1                                                  
NBIN    =                    1                                                  
FPABITPX=                   20 / FPABitsPerPixel                                

COMMENT ---------------- Polarization Analysis and Calibration -----------------
COMMENT  Keys describing the configuration of the Gregorian Optical System (GOS)
COMMENT                                                                         
COMMENT ------------------------------------------------------------------------
GOS_STAT= 'open    '           / Upper GOS shutter                              
LVL3STAT= 'clear   '           / Level 3 (Lamp)                                 
LAMPSTAT= 'none    '           / Lamp status                                    
LVL2STAT= 'clear   '           / Level 2 (Polarizer)                            
POLANGLE= 'none    '           / [deg] Polarizer Angle                          
LVL1STAT= 'clear   '           / Level 1 (Retarder)                             
RETANGLE= 'none    '           / [deg] Retarder angle                           
LVL0STAT= 'FieldStop (2.8arcmin)' / Level 0 (Apeture)                           
APERTURE= '2.8arcmin'          / [arcmin, arcsec, mm] Aperture Property         
LGOSSTAT= 'open    '           / Lower GOS shutter                              
GOS_TEMP=    17.64770507812501 / [C] Upper GOS optics temperature               

COMMENT --------------------------- Adaptive Optics ----------------------------
COMMENT          Keys describing aspects of the adaptive optics system.         
COMMENT ------------------------------------------------------------------------
ATMOS_R0=   0.1492700424803054 / [m] HOAOFriedParameter                         
AO_LOCK =                    T / HOAOLockStatus                                 
AO_LOCKX=                  0.0 / [arcsec] HOAOLockOffPointingX                  
AO_LOCKY=                  0.0 / [arcsec] HOAOLockOffPointingY                  
WFSLOCKX=                  0.0 / [arcsec] LOWFSLockOffPointingX                 
WFSLOCKY=                  0.0 / [arcsec] LOWFSLockOffPointingY                 
LIMBRPOS=                  0.0 / [arcsec] LimbSensorRadialSetPos                
LIMBRATE=               1000.0 / [Hz] LimbSensorRate                            

COMMENT --------------------------- Weather Station ----------------------------
COMMENT    Keys describing information reported by the weather station at the   
COMMENT                    facility during this observation.                    
COMMENT ------------------------------------------------------------------------
WSSOURCE= 'dkist   '           / WeathSource                                    
WIND_SPD=    4.958944660448225 / [m s-1] WeathWindSpeed                         
WIND_DIR=    219.0563882913457 / [deg] WeathWindDirection                       
WS_TEMP =    11.95233890933754 / [C] WeathOutsideTemperature                    
WS_HUMID=    14.39925363345676 / [10**-2] WeathRelativeHumidity                 
WS_DEWPT=   -16.21179492029672 / [C] WeathDewPoint                              
WS_PRESS=    710.0214144384906 / [hPa] WeathBarometricPressure                  
SKYBRIGT=                 -1.0 / WeathSkyBrightness                             

COMMENT --------------------------- VISP Instrument ----------------------------
COMMENT          Keys specific to the operation of the VISP instrument.         
COMMENT ------------------------------------------------------------------------
VSPARMID=                    3 / ArmID                                          
VSPARMPS=             -20.0786 / [deg] ArmPosition                              
VSPARMFC=               54.127 / [mm] ArmFocus                                  
VSPFILT = 'FF01-857_30-25'     / FilterID                                       
VSPFWVLN=              856.229 / [nm] FilterWavelength                          
VSPPOLMD= 'observe_polarimetric' / PolarimeterMode                              
VSPMODID= '0004    '           / ModulatorID                                    
VSPMOD  = 'continuous'         / ModulationType                                 
VSPEXPRT=        41.3571674884 / [Hz] ExposureRate                              
VSPGRTID= 'Newport_316.0_63.40__M20160612_SN3' / GratingID                      
VSPGRTCN=                316.0 / [mm-1] GratingConstant                         
VSPGRTBA=                 63.4 / [deg] GratingBlazeAngle                        
VSPGRTAN=   -65.36539999999999 / [deg] GratingAngle                             
VSPWID  =               0.2142 / [arcsec] SlitWidth                             
VSPSLTSS=       0.132675941182 / [mm] SlitSteppingSize                          
VSPNSTP =                  490 / NumberofSpatialSteps                           
VSPSTP  =                  163 / CurrentSpatialStep                             
VSPTPOS =              39.5625 / [mm] SlitTranslationPosition                   
VSPMIRPS=              17.6642 / [mm] FoldMirrorPosition                        
VSPSPOS =                12.15 / [mm] SlitSelectorPosition                      
VSPNMAPS=                    1 / TotalMapScans                                  
VSPMAP  =                    1 / CurrentMapScan

Thanks for your work on this package!

seanhess commented 1 year ago

Hey @krakrjak are you around? Would you be willing to accept a PR for fits-parse with support for header extensions?

krakrjak commented 1 year ago

Hi @seanhess I am here and I'm very enthusiastic about you reaching out. I have thought a lot about these header extensions, but I stopped with this package basically at MVP where it supports the standard, but not any extensions (yet). I certainly welcome ideas, PRs, designs, and any concepts you might have for supporting these things. I have no objection to supporting a much larger variety of FITS formats and extensions.

I'm all ears! Also, I haven't updated this guy in years! I happy to see someone is poking at it. I guess I should dust it off and make a fresh release.

So tell me @seanhess got any opening ideas of where to start?

seanhess commented 1 year ago

Glad to hear from you! I’ll focus on extension header parsing first, since our fits files don’t load. I should have something early next week.

My fork has a small sample of one of our files if you want to poke around or beat me to anything.

Our image frames have 2 HDUs. The first doesn’t have much in it besides basic metadata, and no data. The second has a bunch of headers and the pixel data.

https://github.com/seanhess/fits-parse/tree/master/fits_files

Looking forward to collaborating! I’ll let you know as soon as I have thought about design changes, and I’d love to hear your thoughts on supporting custom headers if you want.

seanhess commented 1 year ago

Ok @krakrjak first refactoring question. What if we try to get parsed types into a more haskell-friendly format? For example, we currently have StringValue = { type = Null | Empty | Data, value = Maybe Text }

Would you be opposed to making it simply Maybe Text? It can represent all of those states: (Nothing, Just "", Just data).

I haven't gone through NumberType carefully, but it might be better represented with a single ADT:

data NumberValue
  = NumInt Int
  | NumReal Double
  | NumImaginary SomeImaginaryNumberType

Any thoughts?

seanhess commented 1 year ago

The basic theory behind the above is to "Make invalid states unrepresentable". StringValue as defined can be in an inconsistent state: { type = Data, value = Nothing }, or { type = Null, value = Just "asdf" }. This means that any downstream code has to handle possible failure cases at every use site.

seanhess commented 1 year ago

@krakrjak I've made some major changes in my fork to support extensions, will you take a look at let me know what you think about the direction?

  1. Added a fairly complete test suite
  2. Refactored Header to use a Map of Keys => Values, allowing us to parse any header without knowing them in advance
  3. Parsing specific headers into higher-level objects, like BitPix and NAxes
  4. Changes to basic types as described above
  5. Parser consumes ByteStrings directly, without needing to convert to Text first or check for ascii (should help performance a lot!)
  6. Parser for a full HDU, avoiding the need to manually split the ByteString into header / data
  7. Support multiple HDUs
  8. Support inline Comments

What would you like to change?

EDIT: I'm not 100% sure the header parsing will be ergonomic. The NSO file still isn't parsing because we need support for COMMENT keywords first. Afterwards, I'll try to write some code against it and see how it feels. It would be nice to have a single parsing pass into custom Haskell types, but we'll see.

krakrjak commented 1 year ago

My sincere apologies that I have not given any feedback yet on your changes. I am traveling for work this week and I have not found the time to sit down with this code.

At first blush, I think you are headed in the right direction. Supporting multiple HDUs is the main trick. Beyond that we might have to be more specific in the parser rather than more general, but I don't really have an intuition here yet. I may not get a solid chance to look at the code until the weekend.

seanhess commented 1 year ago

Hey @krakrjak, good timing! I just got my changes to a point where I felt comfortable creating a PR. I'll wait until you've had a chance to review before doing anything else.

I'm looking forward to hearing your comments and ideas.

krakrjak commented 1 year ago

Looks really good. Thank you so much for your contribution and waiting on me. I know it's not easy, I've had a strange two weeks with travel and COVID. I'm about to finish the review. Likely no more changes are needed, but I'm double checking a few things to make sure.

seanhess commented 1 year ago

Hooray! Need help making a new release on Hackage?

On Sun, Aug 27, 2023 at 6:48 PM Zac Slade @.***> wrote:

Closed #1 https://github.com/krakrjak/fits-parse/issues/1 as completed via #2 https://github.com/krakrjak/fits-parse/pull/2.

— Reply to this email directly, view it on GitHub https://github.com/krakrjak/fits-parse/issues/1#event-10203460726, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAD66P3JHLHELMTPQZUTKLXXP2HTANCNFSM6AAAAAA3KIXQHY . You are receiving this because you were mentioned.Message ID: @.***>