Raised RuntimeError when trying to make HIMAWARI-8 true color image

BigShuiTai commented 2 years ago

Describe the bug I was trying to make a HIMAWARI-8 true color image. However, SatPy module raised a RuntimeError to me at _slice_data function.

To Reproduce

# Your code here
def get_area_def_format(proj, georange, res):
    min_lon, max_lon, min_lat, max_lat = georange
    left = min_lon
    right = max_lon
    up = min_lat
    down = max_lat
    res = res * 1000
    lat_0 = (up + down) / 2
    lon_0 = (right + left) / 2

    from pyproj import Proj
    p = Proj(proj=proj, lat_0=lat_0, lon_0=lon_0, ellps="WGS84")

    left_ex1, up_ex1 = p(left, up)
    right_ex1, up_ex2 = p(right, up)
    left_ex2, down_ex1 = p(left, down)
    right_ex2, down_ex2 = p(right, down)
    left_ex3, dummy = p(left, lat_0)
    right_ex3, dummy = p(right, lat_0)

    area_extent = (min(left_ex1, left_ex2, left_ex3),
                   min(up_ex1, up_ex2),
                   max(right_ex1, right_ex2, right_ex3),
                   max(down_ex1, down_ex2))

    xsize = int(round((area_extent[2] - area_extent[0]) / res))
    ysize = int(round((area_extent[3] - area_extent[1]) / res))

    return lat_0, lon_0, xsize, ysize, area_extent

channel = "true_color"

scn = Scene(glob.glob("./HS_H08*.DAT.bz2"), reader='ahi_hsd')
scn.load([channel])

georange = (32.5, 47.5, 112.5, 127.5)
latmin, latmax, lonmin, lonmax = georange

from pyresample import get_area_def
area_id = 'TEST01'
lat_0, lon_0, x_size, y_size, area_extent = get_area_def_format("eqc", (lonmin, lonmax, latmin, latmax), 1)
projection = f'+proj=eqc +lat_0={lat_0} +lon_0={lon_0} +ellps=WGS84'
description = 'TEST01'
proj_id = f'eqc_{lat_0}_{lon_0}'
areadef = get_area_def(area_id, description, proj_id, projection, x_size, y_size, area_extent)
new_scn = scn.resample(areadef)
new_scn.save_dataset(channel, base_dir="./true_color/", filename=f"{args.system}_true_color.png")

Expected behavior Succeed in compositing true color image and save it in certain folder.

Actual results

# For these results below I have used these codes in ```_slice_data```:
# print(slice_x, slice_y)
# print(('x', source_area.width), ('y', source_area.height), dataset.sizes.items())
slice(3082, 4582, None) slice(1109, 2253, None)
('x', 1500) ('y', 1144) ItemsView(Frozen({'y': 1144, 'x': 1500}))
slice(3082, 4582, None) slice(1109, 2253, None)
('x', 1500) ('y', 1144) ItemsView(Frozen({'y': 1144, 'x': 1500}))
slice(3082, 4582, None) slice(1109, 2253, None)
('x', 1500) ('y', 1144) ItemsView(Frozen({'y': 1144, 'x': 1500}))
slice(6165, 9163, None) slice(2218, 4506, None)
('x', 2998) ('y', 2288) ItemsView(Frozen({'y': 2288, 'x': 2998}))
slice(6165, 9163, None) slice(2218, 4506, None)
('x', 2998) ('y', 2288) ItemsView(Frozen({'y': 2288, 'x': 2998}))
slice(6165, 9163, None) slice(2218, 4506, None)
('x', 2998) ('y', 2288) ItemsView(Frozen({'y': 2288, 'x': 2998}))
slice(3082, 4582, None) slice(1109, 2253, None)
('x', 1500) ('y', 1144) ItemsView(Frozen({'y': 1144, 'x': 1500}))
slice(3082, 4582, None) slice(1109, 2253, None)
('x', 1500) ('y', 1144) ItemsView(Frozen({'y': 0, 'x': 1500}))
Traceback (most recent call last):
  File "/www/wwwroot/Satima/satpy_ahi_true_color.py", line 271, in <module>
    Satpy()
  File "/www/wwwroot/Satima/satpy_ahi_true_color.py", line 59, in __init__
    self.ask_root()
  File "/www/wwwroot/Satima/satpy_ahi_true_color.py", line 150, in ask_root
    datas = self.extract(roots, center_lon, center_lat, self.georange, args.channel, names)
  File "/www/wwwroot/Satima/satpy_ahi_true_color.py", line 190, in extract
    new_scn = scn.resample(areadef)
  File "/root/miniconda3/lib/python3.9/site-packages/satpy/scene.py", line 908, in resample
    self._resampled_scene(new_scn, destination, resampler=resampler,
  File "/root/miniconda3/lib/python3.9/site-packages/satpy/scene.py", line 819, in _resampled_scene
    dataset, source_area = self._reduce_data(dataset, source_area, destination_area,
  File "/root/miniconda3/lib/python3.9/site-packages/satpy/scene.py", line 869, in _reduce_data
    dataset = self._slice_data(source_area, (slice_x, slice_y), dataset)
  File "/root/miniconda3/lib/python3.9/site-packages/satpy/scene.py", line 783, in _slice_data
    raise RuntimeError
RuntimeError

Environment Info:

OS: Ubuntu 20.04.4 LTS x86_64
Satpy Version: 0.35.0
PyResample Version: 1.23.0
PyProj Version: 3.3.0

Additional context Additionally, I found that it sometimes succeed if I ran the code again.

djhoese commented 2 years ago

Thanks for filing this issue. It is unfortunate how unhelpful that error is. Thanks for that print out it is really helpful. So it seems some input has a size of 0 for the y dimension. Could you add to your print statement dataset.attrs["_satpy_id"] and let us know what the new output is?

This will tell us which dataset has 0 rows and maybe give us an idea what is going on.

BigShuiTai commented 2 years ago

Thanks for filing this issue. It is unfortunate how unhelpful that error is. Thanks for that print out it is really helpful. So it seems some input has a size of 0 for the y dimension. Could you add to your print statement dataset.attrs["_satpy_id"] and let us know what the new output is?

This will tell us which dataset has 0 rows and maybe give us an idea what is going on.

I have tried your suggestion for minutes ago. When the error occurred, it printed this:

# print(slice_x, slice_y)
# print(('x', source_area.width), ('y', source_area.height), dataset.sizes.items())
# print(dataset.attrs["_satpy_id"])
slice(3082, 4582, None) slice(1109, 2253, None)
('x', 1500) ('y', 1144) ItemsView(Frozen({'y': 0, 'x': 1500}))
DataID(name='green_nocorr', resolution=1000)

But when succeed sometimes, I also got this:

# print(slice_x, slice_y)
# print(('x', source_area.width), ('y', source_area.height), dataset.sizes.items())
# print(dataset.attrs["_satpy_id"])
slice(3082, 4582, None) slice(1109, 2253, None)
('x', 1500) ('y', 1144) ItemsView(Frozen({'y': 1144, 'x': 1500}))
DataID(name='green_nocorr', resolution=1000)

Where the error occurred? I'm really eager to know.

djhoese commented 2 years ago

Hm. So with the same input data sometimes it succeeds and sometimes it fails?

In your original script you had that you were loading true_color, but this new print out shows the use of green_nocorr. That version of the green composite shouldn't be used for true_color. Did you make some modifications to the composites?

BigShuiTai commented 2 years ago

Hm. So with the same input data sometimes it succeeds and sometimes it fails?

In your original script you had that you were loading true_color, but this new print out shows the use of green_nocorr. That version of the green composite shouldn't be used for true_color. Did you make some modifications to the composites?

Ah I have forgot to tell you that I used true_color_hybrid composite method which was written by @WP19:

# true_color_hybrid
  green_nocorr:
    compositor: !!python/name:satpy.composites.ahi.GreenCorrector
    prerequisites:
      - name: B02
      - name: B04
    standard_name: toa_reflectance
    fractions: [0.93, 0.07]
  true_color:
    compositor: !!python/name:satpy.composites.GenericCompositor
    prerequisites:
    - name: B03
      modifiers: [sunz_corrected, no_aerosol_rayleigh_corrected]
    - name: green
    - name: B01
      modifiers: [sunz_corrected, no_aerosol_rayleigh_corrected]
    high_resolution_band: red
    standard_name: true_color
  true_color_nocorr:
    compositor: !!python/name:satpy.composites.GenericCompositor
    prerequisites:
    - name: B03
    - name: green_nocorr
    - name: B01
    high_resolution_band: red
    standard_name: true_color
  true_color_hybrid:
    compositor: !!python/name:satpy.composites.DayNightCompositor
    standard_name: true_color_hybrid
    prerequisites:
      - true_color
      - true_color_nocorr

WP19VFF commented 2 years ago

true color without correction and true color(rayleigh corrected) are blended from solor zenith angle of 75-85 degree to make the terminator line more natural

djhoese commented 2 years ago

Interesting idea. I'm surprised that is needed versus just tweaking the flags for the sunz_corrected modifier keyword arguments. Well, unless rayleigh is the problem. I've found it difficult to get rayleigh SZA tweaking to work well.

But @BigShuiTai when it passed and when it failed it was the same data case? Could you try loading only B01, B02, B03, and B04 and then printing out print(scn['B01'].shape) for each band? And if it is only randomly failing in the original case, maybe try running this a couple times and see if the y dimension is every 0. We need to find out if ti is the reader, the compositor(s), or the resampling/reducing that is causing the y dimension to go to 0.

BigShuiTai commented 2 years ago

Hm. So with the same input data sometimes it succeeds and sometimes it fails? In your original script you had that you were loading true_color, but this new print out shows the use of green_nocorr. That version of the green composite shouldn't be used for true_color. Did you make some modifications to the composites?

Ah I have forgot to tell you that I used true_color_hybrid composite method which was written by @WP19:
# true_color_hybrid
  green_nocorr:
    compositor: !!python/name:satpy.composites.ahi.GreenCorrector
    prerequisites:
      - name: B02
      - name: B04
    standard_name: toa_reflectance
    fractions: [0.93, 0.07]
  true_color:
    compositor: !!python/name:satpy.composites.GenericCompositor
    prerequisites:
    - name: B03
      modifiers: [sunz_corrected, no_aerosol_rayleigh_corrected]
    - name: green
    - name: B01
      modifiers: [sunz_corrected, no_aerosol_rayleigh_corrected]
    high_resolution_band: red
    standard_name: true_color
  true_color_nocorr:
    compositor: !!python/name:satpy.composites.GenericCompositor
    prerequisites:
    - name: B03
    - name: green_nocorr
    - name: B01
    high_resolution_band: red
    standard_name: true_color
  true_color_hybrid:
    compositor: !!python/name:satpy.composites.DayNightCompositor
    standard_name: true_color_hybrid
    prerequisites:
      - true_color
      - true_color_nocorr

Interestingly, when I used this composite to ABI (like GOES-16) and AMI (GEO-KOMPSAT-2A), it didn't report the error like before.

BigShuiTai commented 2 years ago

Interesting idea. I'm surprised that is needed versus just tweaking the flags for the sunz_corrected modifier keyword arguments. Well, unless rayleigh is the problem. I've found it difficult to get rayleigh SZA tweaking to work well.

But @BigShuiTai when it passed and when it failed it was the same data case? Could you try loading only B01, B02, B03, and B04 and then printing out print(scn['B01'].shape) for each band? And if it is only randomly failing in the original case, maybe try running this a couple times and see if the y dimension is every 0. We need to find out if ti is the reader, the compositor(s), or the resampling/reducing that is causing the y dimension to go to 0.

I printed each bands' shape, in spite both failed and succeed, it just only printed the same result:

#  for band in ['B01', 'B02', 'B03', 'B04']:
#      scn.load([band])
#      print(scn[band].shape)
(11000, 11000) # BAND01
(11000, 11000) # BAND02
(22000, 22000) # BAND03
(11000, 11000) # BAND04

djhoese commented 2 years ago

ok, now if instead of bands you load true_color and true_color_nocorr and you do:

scn.load(["true_color", "true_color_nocorr"])
for data_arr in scn.values():
    print(data_arr.attrs["_satpy_id"], data_arr.shape)

What do you get?

BigShuiTai commented 2 years ago

ok, now if instead of bands you load true_color and true_color_nocorr and you do:
scn.load(["true_color", "true_color_nocorr"])
for data_arr in scn.values():
    print(data_arr.attrs["_satpy_id"], data_arr.shape)
What do you get?

# when succeed
DataID(name='B03', wavelength=WavelengthRange(min=0.62, central=0.64, max=0.66, unit='µm'), resolution=500, calibration=<calibration.reflectance>, modifiers=()) (22000, 22000)
DataID(name='B01', wavelength=WavelengthRange(min=0.45, central=0.47, max=0.49, unit='µm'), resolution=1000, calibration=<calibration.reflectance>, modifiers=()) (11000, 11000)
DataID(name='B01', wavelength=WavelengthRange(min=0.45, central=0.47, max=0.49, unit='µm'), resolution=1000, calibration=<calibration.reflectance>, modifiers=('sunz_corrected',)) (11000, 11000)
DataID(name='B02', wavelength=WavelengthRange(min=0.49, central=0.51, max=0.53, unit='µm'), resolution=1000, calibration=<calibration.reflectance>, modifiers=('sunz_corrected',)) (11000, 11000)
DataID(name='B03', wavelength=WavelengthRange(min=0.62, central=0.64, max=0.66, unit='µm'), resolution=500, calibration=<calibration.reflectance>, modifiers=('sunz_corrected',)) (22000, 22000)
DataID(name='B03', wavelength=WavelengthRange(min=0.62, central=0.64, max=0.66, unit='µm'), resolution=500, calibration=<calibration.reflectance>, modifiers=('sunz_corrected', 'no_aerosol_rayleigh_corrected')) (22000, 22000)
DataID(name='B04', wavelength=WavelengthRange(min=0.83, central=0.85, max=0.87, unit='µm'), resolution=1000, calibration=<calibration.reflectance>, modifiers=('sunz_corrected',)) (11000, 11000)
DataID(name='green_nocorr', resolution=1000) (11000, 11000)

# when failed
DataID(name='B03', wavelength=WavelengthRange(min=0.62, central=0.64, max=0.66, unit='µm'), resolution=500, calibration=<calibration.reflectance>, modifiers=()) (22000, 22000)
DataID(name='B01', wavelength=WavelengthRange(min=0.45, central=0.47, max=0.49, unit='µm'), resolution=1000, calibration=<calibration.reflectance>, modifiers=()) (11000, 11000)
DataID(name='B03', wavelength=WavelengthRange(min=0.62, central=0.64, max=0.66, unit='µm'), resolution=500, calibration=<calibration.reflectance>, modifiers=('sunz_corrected',)) (22000, 22000)
DataID(name='B03', wavelength=WavelengthRange(min=0.62, central=0.64, max=0.66, unit='µm'), resolution=500, calibration=<calibration.reflectance>, modifiers=('sunz_corrected', 'no_aerosol_rayleigh_corrected')) (22000, 22000)
DataID(name='B02', wavelength=WavelengthRange(min=0.49, central=0.51, max=0.53, unit='µm'), resolution=1000, calibration=<calibration.reflectance>, modifiers=('sunz_corrected',)) (11000, 11000)
DataID(name='B04', wavelength=WavelengthRange(min=0.83, central=0.85, max=0.87, unit='µm'), resolution=1000, calibration=<calibration.reflectance>, modifiers=('sunz_corrected',)) (11000, 11000)
DataID(name='B01', wavelength=WavelengthRange(min=0.45, central=0.47, max=0.49, unit='µm'), resolution=1000, calibration=<calibration.reflectance>, modifiers=('sunz_corrected',)) (11000, 11000)
DataID(name='green_nocorr', resolution=1000) (2028, 11000)

Yeah I found something different in the shape of green_nocorr. How I should do next?

djhoese commented 2 years ago

ok, now if you only load green_nocorr and print out the satpy id and shape in the same way, does it show the difference? This is testing if green_nocorr by itself is failing or if a combination of the other composites/modifiers is doing something.

Very strange...

BigShuiTai commented 2 years ago

ok, now if you only load green_nocorr and print out the satpy id and shape in the same way, does it show the difference? This is testing if green_nocorr by itself is failing or if a combination of the other composites/modifiers is doing something.

Very strange...

Very strange for me too. I tried some times again, but this time it gave me another different result:

# when failed
(1433, 11000)
# when succeed
(11000, 11000)

Its shape changes really random and I was very confused to it.

djhoese commented 2 years ago

Whoa so the failing shape isn't even the same between failures? :confused:

djhoese commented 2 years ago

Can you paste the code you ran to produce the above print out? The whole code. If you can get it down to only the imports, the scene creation, the scene.load, and the print that would be ideal.

BigShuiTai commented 2 years ago

Can you paste the code you ran to produce the above print out? The whole code. If you can get it down to only the imports, the scene creation, the scene.load, and the print that would be ideal.

That's the python code I used. Hoping that it would be a help to you.

import glob, sys
import numpy as np
from satpy.scene import Scene

def get_area_def_format(proj, georange, res):
    min_lon, max_lon, min_lat, max_lat = georange
    left = min_lon
    right = max_lon
    up = min_lat
    down = max_lat
    res = res * 1000
    lat_0 = (up + down) / 2
    lon_0 = (right + left) / 2

    from pyproj import Proj
    p = Proj(proj=proj, lat_0=lat_0, lon_0=lon_0, ellps="WGS84")

    left_ex1, up_ex1 = p(left, up)
    right_ex1, up_ex2 = p(right, up)
    left_ex2, down_ex1 = p(left, down)
    right_ex2, down_ex2 = p(right, down)
    left_ex3, dummy = p(left, lat_0)
    right_ex3, dummy = p(right, lat_0)

    area_extent = (min(left_ex1, left_ex2, left_ex3),
                   min(up_ex1, up_ex2),
                   max(right_ex1, right_ex2, right_ex3),
                   max(down_ex1, down_ex2))

    xsize = int(round((area_extent[2] - area_extent[0]) / res))
    ysize = int(round((area_extent[3] - area_extent[1]) / res))

    return lat_0, lon_0, xsize, ysize, area_extent

channel = "true_color_hybrid"

scn = Scene(glob.glob("../satellite_data/HS_H08*.DAT.bz2"), reader='ahi_hsd')
scn.load([channel])

'''
for band in ['B01', 'B02', 'B03', 'B04']:
    scn.load([band])
    print(scn[band].shape)

scn.load(["true_color", "true_color_nocorr"])
for data_arr in scn.values():
    print(data_arr.attrs["_satpy_id"], data_arr.shape)
'''
scn.load(["green_nocorr"])
print(scn["green_nocorr"].shape)

georange = (20.5, 45.5, 105.5, 145.5)
latmin, latmax, lonmin, lonmax = georange

from pyresample import get_area_def
area_id = 'TEST01'
lat_0, lon_0, x_size, y_size, area_extent = get_area_def_format("eqc", (lonmin, lonmax, latmin, latmax), 1)
projection = f'+proj=eqc +lat_0={lat_0} +lon_0={lon_0} +ellps=WGS84'
description = 'TEST01'
proj_id = f'eqc_{lat_0}_{lon_0}'
areadef = get_area_def(area_id, description, proj_id, projection, x_size, y_size, area_extent)
new_scn = scn.resample(areadef)
new_scn.save_dataset(channel, base_dir="./", filename="true_color.png")

djhoese commented 2 years ago

I don't think I'll need the YAML files I can take what you've pasted above. Can you paste the python code here? The tarball that I think you tried to attach doesn't seem to work. It gives me a 404.

BigShuiTai commented 2 years ago

I don't think I'll need the YAML files I can take what you've pasted above. Can you paste the python code here? The tarball that I think you tried to attach doesn't seem to work. It gives me a 404.

I have edited my reply and attached my code on it.

djhoese commented 2 years ago

Thanks. Can you try commenting out the .load of the channel so you are only loading the green? You'll need to comment the lower block of code that does the actual processing too but we only care about the green shape.

BigShuiTai commented 2 years ago

Thanks. Can you try commenting out the .load of the channel so you are only loading the green? You'll need to comment the lower block of code that does the actual processing too but we only care about the green shape.

Yes, I commented the scn.load([channel]), only loading the green_nocorr channel. However, its shape changed every times.

(base) root@admin:/all/test_file# python satpy_ahi_true_color_test.py
(4371, 11000)
(base) root@admin:/all/test_file# python satpy_ahi_true_color_test.py
(78, 11000)
(base) root@admin:/all/test_file# python satpy_ahi_true_color_test.py
(3345, 11000)
(base) root@admin:/all/test_file# python satpy_ahi_true_color_test.py
(958, 11000)
(base) root@admin:/all/test_file# python satpy_ahi_true_color_test.py
(11000, 11000) # succeed
(base) root@admin:/all/test_file# python satpy_ahi_true_color_test.py
(8, 11000)
(base) root@admin:/all/test_file# python satpy_ahi_true_color_test.py
(4, 11000)
(base) root@admin:/all/test_file# python satpy_ahi_true_color_test.py
(1547, 11000)
(base) root@admin:/all/test_file# python satpy_ahi_true_color_test.py
(3345, 11000)
(base) root@admin:/all/test_file# python satpy_ahi_true_color_test.py
(11000, 11000) # succeed

mherbertson commented 2 years ago

I can replicate this behaviour with AHI source files in .bz2, suspecting the unzip function in satpy. Would also explain why ABI/AMI and uncompressed AHI data sources are working correctly (for me). Can you confirm your script performs correctly if you provide the data to satpy decompressed and in .DAT format?

scn = Scene(glob.glob("../satellite_data/HS_H08*.DAT"), reader='ahi_hsd')

simonrp84 commented 2 years ago

Is this using the release version or the latest code from github? Unsure if there's any changes between the two but I can't replicate the problem locally (using latest code from git).

mherbertson commented 2 years ago

I'm running satpy 0.36.0 (pypi)

FYI it appears randomly, my first cycle of the above code:

>>> print(scn["green_nocorr"].shape)
(11000, 11000)

second run:

>>> print(scn["green_nocorr"].shape)
(78, 11000)

the files all appear to be the right size in the /tmp folder so on first glance it seems they're getting decompressed correctly

BigShuiTai commented 2 years ago

I can replicate this behaviour with AHI source files in .bz2, suspecting the unzip function in satpy. Would also explain why ABI/AMI and uncompressed AHI data sources are working correctly (for me). Can you confirm your script performs correctly if you provide the data to satpy decompressed and in .DAT format?

scn = Scene(glob.glob("../satellite_data/HS_H08*.DAT"), reader='ahi_hsd')

Oh I finally succeed in using the decompressed AHI source files! It seemed that the unzip function in satpy is the culprit

mherbertson commented 2 years ago

In good news it appears the unzipping process isn't causing this. Manually copying over the /tmp files after satpy decompresses them work successfully and match an externally extracted source file hash. I'll keep digging.

mherbertson commented 2 years ago

AHI segments have different observation times. With the change of start_time to scheduled in PR #2031, we lost the ability to correctly order the segments as filenames will be random after decompressing .bz2. https://github.com/pytroll/satpy/blob/bafe54e1d3c598f1d70b616e8d27bb2b25c62190/satpy/readers/yaml_reader.py#L626 I've implemented a fix that adds an optional 2 digit prefix (segment number) to the unzip function which will provide the correct order. #2128

BigShuiTai commented 2 years ago

@mherbertson In ahi_hsd reader, it use bz2file = bz2.BZ2File(filename) to decompress the AHI native data. I think bz2.open() can solve this problem, as far as I’m concerned.

pytroll / satpy

Raised RuntimeError when trying to make HIMAWARI-8 true color image #2127