Hector integration - Githubissues

hzovaro commented 9 months ago

Get spaxelsleuth working with data from the Hector Galaxy Survey from Gabby's spectral fitting pipeline.

BIG TO DOs

[x] Finalise format for output FITS files
[ ] Calculate emission line ratios, etc. for individual components, but only for bundles in Spector
[x] Add metallicity calculations back in
[x] #65
[ ] Re-run on new files from Gabby

little to dos

[x] Place filename of DataFrame (+ time modified) in FITS header (perhaps move load_hector_df into the export FITS function?)
[x] Incorporate input catalog w/ stellar masses, etc. once it is available
[x] Add # of dithers to FITS header
[x] Write tests
[ ] Write example Notebook
[ ] Double-check the way that spaxels are masked in hector.process_galaxies() - some dodgy D4000 mesaurements in outskirts...
[ ] Make catalogue of galaxies w/ 2,3-comp. spaxels, likely AGN
[x] Put input args to make_hector_df() in output FITS header
[x] Add sub-dividers to FITS headers to make them easier to read
[x] Filter fits.writeto warning
[x] Check that export FITS is working

Maybes

[ ] Combine flags in single extensions?

Getting stuff working on misfit:

[x] Check for duplicate data cubes: once bash script has finished, check for duplicates in file listings using python
[x] Update data cube paths in the config file & in hector.py, check they work
[x] Try running end-to-end on a small subset of galaxies first to check paths, etc. are OK (triple-check file existence first!)

Things to suggest/raise with Gabby

[x] How will binned data products be stored? Will need to update _process_gals to work with these data
[x] Has the LSF been subtracted from the reported velocity dispersions?
[x] What are the units of the emission line fluxes, etc.?
[x] Consistency in file formats between "rec" and 1/2/3-component fits
[x] NaN values with finite errors for certain data products, e.g. gas velocity dispersion

Questions for Hector team members

[ ] Would people prefer having all data products in a single FITS file, or for them to be split into sub-groups (e.g. emission line-derived products, stellar kinematics, flags, etc. in different files)?

hzovaro commented 9 months ago

Stuff to do/keep in mind:

[ ] We will need to deal with changing spectral resolutions depending on which spectrograph was used. Can probably scrape this from the header in the future.
[ ] The cubes change shape, so we can't hard-code the cube sizes
[ ] Think about suggestions for how to organise the files

sarahsweet commented 9 months ago

24 Busy week feedback:

Provide feedback on the FITS format - a. Is it intuitive and easy to use?
- Yes, even though there are many extensions. I think this is helpful; it just means that the documentation needs to be super clear.
- Suggest to have the extensions in the same order as the column descriptions wiki page. Or perhaps even a separate wiki page for just Hector, since there are many more column descriptions than extensions in the Hector FITS files.
- Maybe avoid spaces and non-alphanumeric characters in the extension names? Not sure.

b. Are there any missing header keywords that would be useful?

It might also be helpful to have a short column description in each extension's header, as a keyword, e.g. a comment perhaps?

c. Are there any typos/mistakes/etc. in the FITS headers?

I would recommend renaming HALPHA continuum to R-band continuum. I know it is not over the whole band, but I think it makes more sense.
The 'data quality and S/N flags' are booleans on the column descriptions but numeric in the FITS files. remove 'flag' from the descriptions from 'missing flag'.
HALPHA A/N description incomplete on the column descriptions wiki.
HALPHA EW description 'leven' -> 'level'
rename metallicity diagnostics to start with 'Z' e.g. change 'log(O/H) + 12 (N2Ha_K19/O3O2_K19)' to 'Z_K19'
did you ask Lisa about the issue you raise on the implementation wiki page, where it is ambiguous which of 3726 and/or 3729 are used?
multiple slices exist for Z_K19 but column descriptions suggests it is only done on the total
BPT (numeric) does not say which numerics correspond to which categories. Also give definition or refer to the implementation page. What does BPT error mean?
give definition for SFR?
the N2Ha-based quantities are not in the column descriptions
say in wiki if v_* is LOS or corrected for inclination
chi2 missing from column description wiki
some column descriptions have S/N and some have SNR; choose one
median continuum S/N (ppxf) not clear in column description wiki
why both std. dev. and error in HALPHA and B-band continuum?
what is median spectral value?

d. Should data be split up into multiple FITS files grouping e.g. emission line measurements, stellar kinematics, etc.? Or all-in-one?

There are really a lot of extensions, but I think it helps to have them all in the same place.
If grouping within the files is possible could be useful, if it could be implemented? I think it would need to not make it more difficult to read / access, or it would defeat the purpose.
I also wonder if there are some that are less widely used, which could be omitted, but such a choice is probably science-dependent, so difficult to make!

e. Should the data cubes also be stored in the same FITS file?

I think that having the data cubes separate is probably more streamlined; they will be much less frequently used than the extracted maps and derived products. Perhaps one (/pair of?) FITS files for cubes, one for maps + map-derived quantities, and one for ancillary + input catalogue data?

Inspect the data products - a. Do some basic science (e.g. recreating the mass-metallicity relation) - do the values look reasonable?
- I plotted the continuum from some random frames and found spatial offsets between the red and blue. This might be just a feature of CVD but would be worth checking the input data.
- I made some plots e.g. MZR and found sensible values. There were a few with missing or spurious data:
- extensions 3-7 v_gas etc. always missing data
- ext 8 missing components flag is always (or nearly always?) 0 in the nocuts sample
- 9-12 are always missing data
- the line flux errors are sometimes zero; is this sensible? Should it be NaN instead?
- 41, 46, 49, 50 S/N on derived lines are often missing, perhaps hinting at a problem with error propagation? E.g. 42, 47, ... which they are derived from usually have defined S/N.
- 60-68 flags always 0 in nocuts
- 69-79, 81-82 always 0 in nocuts and cuts
- 83-96 missing flux flags always 0
- 136,7 missing v*,sigma* flags always 0
- 138,9 D4000 error > D4000

b. Are there any additional data products people would like for their science? E.g. extra metallicity measurements, stellar indices, etc.

We already spoke about implementing and outputting electron temperatures in a variety of species, electron densities in [OII] and [SII], and electron temperature metallicities. Thank you! :D
I would like a range of some of the popular SEL metallicity diagnostics implemented as well too if possible, please, e.g. Z_R23_KK04, Z_N2O2_KD02, Z_N2S2, Z_N2Ha, Z_N2S2Ha, Z_O3N2 - e.g. see definitions in Poetrodjojo+21.

Other questions - a. Should measurements come pre-masked if they fail S/N and/or DQ criterion? E.g. should line fluxes be NaN’d if the S/N < 3? Or should this be up to the user?
- I strongly suggest this should be up to the user, since they may want to choose a different S/N limit, or use S/N cut in a different line or continuum or binning scheme. They can create their own masks based on flux and noise.
- Having both cuts and nocuts available is probably a good idea, since the S/N_Ha >= 3 cut will be useful for many as you expect!

b. Should data products from all pipelines be stored in a single FITS file?

This is probably a discussion to have as a team. My thought is that it could be convenient but might be unwieldy. I think it depends on final file size and number of extensions, and how clear the instructions are as to what the different extensions mean, or which file contains which extensions and how to join them together. In particular for some similar properties derived from multiple pipelines; if separate pipelines then someone might find the wrong one, but if all together then someone might find both but not know which to use.

hzovaro commented 5 months ago

Issue moved to Hector-Galaxy-Survey fork.

hzovaro / spaxelsleuth

Hector integration #36