nasa-fornax / fornax-demo-notebooks

Demo notebooks for the Fornax project
https://nasa-fornax.github.io/fornax-demo-notebooks/
BSD 3-Clause "New" or "Revised" License
8 stars 19 forks source link

Spectroscopy Notebook Updates #281

Closed afaisst closed 3 months ago

afaisst commented 3 months ago

The following changes were made:

bsipocz commented 3 months ago

As a general question: do we want to delete the fits files from the various archives after we have downloaded them and read them into our df_spec? I think the answer is yes, but can also see the reasoning in keeping them so you don't have to re-download every time you make a slight change to your sample. Especially on Fornax, we are going to run into space issues if we don't delete the fits files. @troyraen @bsipocz what do you recommend?

I would think the normal workflow is to hoard data one is actively working on, so I would not delete (but I'm a dinosaur of an astronomer). So I would instead propagate this issue upstream to say we/the users/ will need access to some temporary space for all of these. Temporary in the sense of a scratch space, so nobackups, or maybe even no survival of a large restart, but to be around for a few weeks while someone is actively working on a use case. astropy/astroquery has this idea of a cache space, but with all honesty, it's not super reliable, and I would not count on it, especially as none of the VO backended modules use it atm.

afaisst commented 3 months ago

As a general question: do we want to delete the fits files from the various archives after we have downloaded them and read them into our df_spec? I think the answer is yes, but can also see the reasoning in keeping them so you don't have to re-download every time you make a slight change to your sample. Especially on Fornax, we are going to run into space issues if we don't delete the fits files. @troyraen @bsipocz what do you recommend?

I would think the normal workflow is to hoard data one is actively working on, so I would not delete (but I'm a dinosaur of an astronomer). So I would instead propagate this issue upstream to say we/the users/ will need access to some temporary space for all of these. Temporary in the sense of a scratch space, so nobackups, or maybe even no survival of a large restart, but to be around for a few weeks while someone is actively working on a use case. astropy/astroquery has this idea of a cache space, but with all honesty, it's not super reliable, and I would not count on it, especially as none of the VO backended modules use it atm.

I agree with that. Somehow limit the disk space and give the user a warning when space is tight? We can also write a clean-up function that clears all the temporary files at some point.

afaisst commented 3 months ago

ok, I made the following updates according to the comments above:

@jkrick : Ready to review again... then I will merge.

jkrick commented 3 months ago

As a general question: do we want to delete the fits files from the various archives after we have downloaded them and read them into our df_spec? I think the answer is yes, but can also see the reasoning in keeping them so you don't have to re-download every time you make a slight change to your sample. Especially on Fornax, we are going to run into space issues if we don't delete the fits files. @troyraen @bsipocz what do you recommend?

I would think the normal workflow is to hoard data one is actively working on, so I would not delete (but I'm a dinosaur of an astronomer). So I would instead propagate this issue upstream to say we/the users/ will need access to some temporary space for all of these. Temporary in the sense of a scratch space, so nobackups, or maybe even no survival of a large restart, but to be around for a few weeks while someone is actively working on a use case. astropy/astroquery has this idea of a cache space, but with all honesty, it's not super reliable, and I would not count on it, especially as none of the VO backended modules use it atm.

I agree with that. Somehow limit the disk space and give the user a warning when space is tight? We can also write a clean-up function that clears all the temporary files at some point.

I am working on Herschel module and am at 10G and not even done downloading tar files for a single target Arp220 (herschel likes to give you lots of files....too many files.... but I can't control that. I think we should delete tar files.

troyraen commented 3 months ago

Since this is a Fornax notebook I think we have to make it usable on the Fonax Console, which means respecting the 10G user disk space.

We can also write a clean-up function that clears all the temporary files at some point.

If the full notebook needs less than 5G(?) disk space, then my vote is to write this function and make it an optional thing, so it's available for the to user run or not as they wish. (Choosing 5G to leave space for other things.)

I am working on Herschel module and am at 10G and not even done downloading tar files for a single target Arp220 (herschel likes to give you lots of files....too many files.... but I can't control that. I think we should delete tar files.

Do you know how much disk space would be needed for all the Herschel stuff you want to download?

So I would instead propagate this issue upstream to say we/the users/ will need access to some temporary space for all of these. Temporary in the sense of a scratch space, so nobackups, or maybe even no survival of a large restart, but to be around for a few weeks while someone is actively working on a use case.

I agree that's a good thing to push for. I don't think we should count on that to come in time for this notebook to rely on it.

jkrick commented 3 months ago
  1. opened an issue (#285) for what to do with the downloaded data to let that conversation continue but be able to close this PR when the rest is ready.
  2. I am getting a UnitConversionError on the JWST module:

    
    UnitConversionError                       Traceback (most recent call last)
    File <timed exec>:2

File /stage/irsa-staff-jkrick/fornax-demo-notebooks/spectroscopy/code_src/mast_functions.py:55, in JWST_get_spec(sample_table, search_radius_arcsec, datadir, verbose) 53 ## Group 54 print("Grouping Spectra... ") ---> 55 df_jwst_group = JWST_group_spectra(df_jwst_all, verbose=verbose, quickplot=False) 56 print("done") 58 return(df_jwst_group)

File /stage/irsa-staff-jkrick/fornax-demo-notebooks/spectroscopy/code_src/mast_functions.py:238, in JWST_group_spectra(df, verbose, quickplot) 235 if verbose: print("Units of fluxes for each spectrum: {}".format( ",".join([str(tt) for tt in fluxes_units]) ) ) 237 ## Unit conversion to erg/s/cm2/A (note fluxes are nominally in Jy. So have to do the step with dividing by lam^2) --> 238 fluxes_stack_cgs = (fluxes_stack * fluxes_units[0]).to(u.erg / u.second / (u.centimeter*2) / u.hertz) (const.c.to(u.angstrom/u.second)) / (wave_grid.to(u.angstrom)2) 239 fluxes_stack_cgs = fluxes_stack_cgs.to(u.erg / u.second / (u.centimeter2) / u.angstrom) 241 ## Add to data frame

File /opt/conda/lib/python3.11/site-packages/astropy/units/quantity.py:938, in Quantity.to(self, unit, equivalencies, copy) 934 unit = Unit(unit) 935 if copy: 936 # Avoid using to_value to ensure that we make a copy. We also 937 # don't want to slow down this method (esp. the scalar case). --> 938 value = self._to_value(unit, equivalencies) 939 else: 940 # to_value only copies if necessary 941 value = self.to_value(unit, equivalencies)

File /opt/conda/lib/python3.11/site-packages/astropy/units/quantity.py:891, in Quantity._to_value(self, unit, equivalencies) 888 equivalencies = self._equivalencies 889 if not self.dtype.names or isinstance(self.unit, StructuredUnit): 890 # Standard path, let unit to do work. --> 891 return self.unit.to( 892 unit, self.view(np.ndarray), equivalencies=equivalencies 893 ) 895 else: 896 # The .to() method of a simple unit cannot convert a structured 897 # dtype, so we work around it, by recursing. 898 # TODO: deprecate this? 899 # Convert simple to Structured on initialization? 900 result = np.empty_like(self.view(np.ndarray))

File /opt/conda/lib/python3.11/site-packages/astropy/units/core.py:1195, in UnitBase.to(self, other, value, equivalencies) 1193 return UNITY 1194 else: -> 1195 return self._get_converter(Unit(other), equivalencies)(value)

File /opt/conda/lib/python3.11/site-packages/astropy/units/core.py:1124, in UnitBase._get_converter(self, other, equivalencies) 1121 else: 1122 return lambda v: b(converter(v)) -> 1124 raise exc

File /opt/conda/lib/python3.11/site-packages/astropy/units/core.py:1107, in UnitBase._get_converter(self, other, equivalencies) 1105 # if that doesn't work, maybe we can do it with equivalencies? 1106 try: -> 1107 return self._apply_equivalencies( 1108 self, other, self._normalize_equivalencies(equivalencies) 1109 ) 1110 except UnitsError as exc: 1111 # Last hope: maybe other knows how to do it? 1112 # We assume the equivalencies have the unit itself as first item. 1113 # TODO: maybe better for other to have a _back_converter method? 1114 if hasattr(other, "equivalencies"):

File /opt/conda/lib/python3.11/site-packages/astropy/units/core.py:1085, in UnitBase._apply_equivalencies(self, unit, other, equivalencies) 1082 unit_str = get_err_str(unit) 1083 other_str = get_err_str(other) -> 1085 raise UnitConversionError(f"{unit_str} and {other_str} are not convertible")

UnitConversionError: 'MJy / sr' and 'erg / (Hz s cm2)' (spectral flux density) are not convertible

bsipocz commented 3 months ago

UnitConversionError: 'MJy / sr' and 'erg / (Hz s cm2)' (spectral flux density) are not convertible

That looks to be a case for using an equivalency? https://docs.astropy.org/en/stable/units/equivalencies.html#spectral-flux-and-luminosity-density-units

afaisst commented 3 months ago

should all be resolved now. Issued ticket to astropy/specutils to fix the bug regarding reading in wrong units: https://github.com/astropy/specutils/issues/1143

afaisst commented 3 months ago

Ran through it successfully.