pytroll / satpy

Python package for earth-observing satellite data processing
http://satpy.readthedocs.org/en/latest/
GNU General Public License v3.0
1.05k stars 289 forks source link

The `ahi_hsd` reader returns Brightness Temperatures as `float64` #2880

Open simonrp84 opened 1 month ago

simonrp84 commented 1 month ago

Describe the bug Brightness temperature data should be returned with the float32 dtype, but for the ahi_hsd reader BTs are returned as float64.

I suspect this is a numpy=2.0 issue with some coefficient used to convert radiance -> BT being a float64.

To Reproduce

from satpy import Scene; from glob import glob
scn = Scene(glob('*'), reader='ahi_hsd')
scn.load(['B13'])
print(scn['B13'].dtype)

Expected behavior The print statement to show dtype('float32')

Actual results The print statement shows dtype('float64')

Environment Info:

djhoese commented 1 month ago

Looks like this is specific to BTs:

In [2]: scn = Scene(reader="ahi_hsd", filenames=glob("/data/satellite/ahi/hsd/1830/HS_H08_20181112_1830_B*FLDK*"))

In [3]: scn.load(["B01", "B14"])

In [4]: scn["B01"].dtype
Out[4]: dtype('float32')

In [5]: scn["B14"].dtype
Out[5]: dtype('float64')
djhoese commented 1 month ago

Just about every scalar used in the BT calibration is a 64-bit float. Is it "the right thing to do" to cast them to 32-bit floats?

djhoese commented 2 weeks ago

We talked on slack after I noticed that counts are not returned as integers and have no _FillValue defined (because it is NaN). I think we agreed this should be changed, but means I really don't have the time to tackle this any more. Here's the changes I started to make so far:

Subject: [PATCH] Initial AHI dtype work
---
Index: satpy/readers/ahi_hsd.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/satpy/readers/ahi_hsd.py b/satpy/readers/ahi_hsd.py
--- a/satpy/readers/ahi_hsd.py  (revision 10c63025ffb9538973e61f15fb19436eda28938e)
+++ b/satpy/readers/ahi_hsd.py  (date 1724428627166)
@@ -743,7 +743,7 @@
         c1_ = self._header["calibration"]["c1_rad2tb_conversion"][0]
         c2_ = self._header["calibration"]["c2_rad2tb_conversion"][0]

-        return (c0_ + c1_ * Te_ + c2_ * Te_ ** 2).clip(0)
+        return (c0_ + c1_ * Te_ + c2_ * Te_ ** 2).clip(0).astype(data.dtype)

 class _NominalTimeCalculator:
Index: satpy/tests/reader_tests/test_ahi_hsd.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/satpy/tests/reader_tests/test_ahi_hsd.py b/satpy/tests/reader_tests/test_ahi_hsd.py
--- a/satpy/tests/reader_tests/test_ahi_hsd.py  (revision 10c63025ffb9538973e61f15fb19436eda28938e)
+++ b/satpy/tests/reader_tests/test_ahi_hsd.py  (date 1724433974162)
@@ -361,12 +361,19 @@
                     fh.read_band(mock.MagicMock(), mock.MagicMock())
                 mask_space.assert_not_called()

-    def test_read_band_from_actual_file(self, hsd_file_jp01):
+    @pytest.mark.parametrize(
+        ("calib", "exp_units"),
+        [
+            ("counts", "1"),
+            ("reflectance", "%")
+        ]
+    )
+    def test_read_vis_band_from_actual_file(self, hsd_file_jp01, calib, exp_units):
         """Test read_bands on real data."""
         filename_info = {"segment": 1, "total_segments": 1}
         filetype_info = {"file_type": "blahB01"}
         fh = AHIHSDFileHandler(hsd_file_jp01, filename_info, filetype_info)
-        key = {"name": "B01", "calibration": "counts", "resolution": 1000}
+        key = {"name": "B01", "calibration": calib, "resolution": 1000}
         import dask
         with dask.config.set({"array.chunk-size": "32MiB"}):
             with warnings.catch_warnings():
@@ -380,10 +387,34 @@
                         "wavelength": 2,
                         "resolution": 1000,
                     })
+            assert data.attrs["units"] == exp_units
             assert data.chunks == ((1100,) * 10, (1100,) * 10)
             assert data.dtype == data.compute().dtype
             assert data.dtype == np.float32

+    # def test_read_ir_band_from_actual_file(self, hsd_file_jp01):
+    #     """Test read_bands on real data."""
+    #     filename_info = {"segment": 1, "total_segments": 1}
+    #     filetype_info = {"file_type": "blahB14"}
+    #     fh = AHIHSDFileHandler(hsd_file_jp01, filename_info, filetype_info)
+    #     key = {"name": "B14", "calibration": "counts", "resolution": 1000}
+    #     import dask
+    #     with dask.config.set({"array.chunk-size": "32MiB"}):
+    #         with warnings.catch_warnings():
+    #             # The header isn't valid
+    #             warnings.filterwarnings("ignore", category=UserWarning, message=r"Actual .* header size")
+    #             data = fh.read_band(
+    #                 key,
+    #                 {
+    #                     "units": "%",
+    #                     "standard_name": "toa_bidirectional_reflectance",
+    #                     "wavelength": 2,
+    #                     "resolution": 1000,
+    #                 })
+    #         assert data.chunks == ((1100,) * 10, (1100,) * 10)
+    #         assert data.dtype == data.compute().dtype
+    #         assert data.dtype == np.float32
+
     @mock.patch("satpy.readers.ahi_hsd.AHIHSDFileHandler._read_data")
     @mock.patch("satpy.readers.ahi_hsd.AHIHSDFileHandler._mask_invalid")
     @mock.patch("satpy.readers.ahi_hsd.AHIHSDFileHandler.calibrate")