pytroll / satpy

Python package for earth-observing satellite data processing
http://satpy.readthedocs.org/en/latest/
GNU General Public License v3.0
1.08k stars 298 forks source link

Add the functionality to select single repeat cycles/timesteps for LI L2 accumulated archive (ARC) data #2878

Open ameraner opened 3 months ago

ameraner commented 3 months ago

Feature Request

Is your feature request related to a problem? Please describe. Satpy and the li_l2_nc reader currently does not offer a simple way to extract single repeat cycles/timesteps from LI L2 accumulated products (AF/AFA/AFR). These products contain 20 timesteps covering 10 minutes of sensing time.

When loading the main variables in the accumulated_flash_area, flash_accumulation, flash_radiance with the default reader kwarg with_area_definition=True, the reader will currently return a 2-d array with the summation of the entire sensing period of the file (particularly after https://github.com/pytroll/satpy/pull/2867). This way, the archived products behave the same as the normal dissemination products.

However, some users need to be able to access each timestep individually, or might want to extract a specific range of timesteps.

Describe the solution you'd like Ideally, an easy interface to select single timesteps returned as a 2-d array, and ranges of timesteps/all timesteps organised in a 3-d xarray with a time coordinate. It could look like

scn.load(["accumulated_flash_area"])  # returns summed-up array as already currently
scn.load(["accumulated_flash_area"], start_time=[datetime(2024, 8, 8, 12, 0, 0)]) # returns 2d array for given timestamp
scn.load(["accumulated_flash_area"], start_time='all') # returns 3d array with time coordinate and all timesteps
scn.load(["accumulated_flash_area"], start_time=datetime(2024, 8, 8, 12, 0, 0), end_time=datetime(2024, 8, 8, 12, 1, 0)) # returns 3d array with time coordinate with all timesteps within the given range

Or similar. Whether the above is implemented as a load kwarg to be handled by the FileYamlReader, or via the DataID/DataQuery mechanism, it's to be defined.

This has been discussed a bit on the slack on 08/08/2024: https://pytroll.slack.com/archives/C0LNH7LMB/p1723118806005139

Describe any changes to existing user workflow No, this is a new feature.

Additional context There is currently a workaround/hack to access single timesteps and get them into the gridded format.

First, this is the nominal, recommended code to read the accumulated data as 2-d arrays. It will give you the summed-up 2-d array for the entire timespan of the file.

from satpy import Scene
from glob import glob
import os
import numpy as np

path_to_testdata = '/tcenas/fbf/mtg/RollingBuffer/OPE/l2pf/LI-L2/'

# initialise LI L2 acc product Scene to produce 2-d array output (recommended, does not require resampling)
scn_2d_adef = Scene(filenames=glob(os.path.join(path_to_testdata, '*AFA*--ARC-*T_0030_0001.nc')), reader='li_l2_nc')
scn_2d_adef.load(['accumulated_flash_area'])

afa_directfrom2d = scn_2d_adef['accumulated_flash_area']

Now the workaround if you need the single datasets, while we wait for a better way to be used with the mode above:

# in alternative, you can initialise it the Scene produce 1-d outputs by setting with_area_definition False
scn_1d = Scene(filenames=glob(os.path.join(path_to_testdata, '*AFA*--ARC-*T_0030_0001.nc')), reader='li_l2_nc',
               reader_kwargs={'with_area_definition': False})
scn_1d.load(['accumulated_flash_area'])
# The array is now 1-d, and we need to resample it to the correct grid (!) and using the bucket_sum resampler (!) as below
scn_1d_r = scn_1d.resample('mtg_fci_fdss_2km', resampler='bucket_sum', empty_bucket_value=np.nan)

# now, this
afa_resampledfrom1d = scn_1d_r['accumulated_flash_area']
# is a 2-d array (practically) the same as afa_directfrom2d  as computed in the previous example
# note: possible remaining spurious differences are related to resampling close to the disk edge

# WORKAROUND/HACK FOR ISSUE
# Now, we can use the fact that we can reconstruct the 2-d array from the scn_1d Scene to access single time steps
# to do that, we need to load the accumulation_offsets variable, that contains the indices for each time step in the 1-d array
scn_1d.load(['accumulation_offsets'])
# let's extract the third time step as an example
timestep = 3
start_index = int(scn_1d['accumulation_offsets'][timestep-1])
end_index = int(scn_1d['accumulation_offsets'][timestep])

# let's create a new dataset, but in principle we could also overwrite accumulated_flash_area directly if we wanted to, and composites would then work
ds_name_new = 'accumulated_flash_area_' + str(timestep)
# and here it comes: we slice the 1-d dataset using the extracted start and end indices
scn_1d[ds_name_new] = scn_1d['accumulated_flash_area'][start_index:end_index]
# we need to do the same for the coordinates
scn_1d[ds_name_new].attrs['area'] = scn_1d['accumulated_flash_area'].attrs['area'].copy()
scn_1d[ds_name_new].attrs['area'].lats = scn_1d[ds_name_new].attrs['area'].lats[start_index:end_index]
scn_1d[ds_name_new].attrs['area'].lons = scn_1d[ds_name_new].attrs['area'].lons[start_index:end_index]
#  then let's resample this to the correct grid (!) and using the bucket_sum resampler (!) as below
scn_1d_r = scn_1d.resample('mtg_fci_fdss_2km',  resampler='bucket_sum', empty_bucket_value=np.nan)

# and here's our extracted 2-d array for one timestep
afa_resampled1d_fortimestep = scn_1d_r[ds_name_new]

note that you need an updated pyresample (>=1.29.0) for the bucket_sum to work as intended here (due to https://github.com/pytroll/pyresample/pull/602)