pytroll / satpy

Python package for earth-observing satellite data processing
http://satpy.readthedocs.org/en/latest/
GNU General Public License v3.0
1.07k stars 295 forks source link

FCI group_files groups wrongly #2424

Open gerritholl opened 1 year ago

gerritholl commented 1 year ago

Describe the bug

Grouping files "naively" using group_files has incorrect results for FCI.

To Reproduce

Example grouping three timesteps:

from glob import glob
import satpy.readers
from satpy.utils import debug_on; debug_on()

groups = satpy.readers.group_files(
        glob("/media/nas/x21308/MTG_test_data/2022_05_MTG_Testdata/RC010[012]/*BODY*.nc"),
        reader="fci_l1c_nc")
print(len(groups))

Expected behavior

I expect exactly 3 groups.

Actual results

There are 99 groups.

Environment Info:

Additional context

This bug makes MultiScene hard to use with FCI. For example, the following will have incorrect results as well:

import hdf5plugin
from glob import glob
from satpy.multiscene import MultiScene, timeseries
from satpy.utils import debug_on; debug_on()

ms = MultiScene.from_files(
        glob("/media/nas/x21308/MTG_test_data/2022_05_MTG_Testdata/RC010[012]/*BODY*.nc"),
        reader="fci_l1c_nc")
ms.load(["ir_105"])
sc = ms.blend(blend_function=timeseries)
gerritholl commented 1 year ago

The result is correct when using

groups = satpy.readers.group_files(
        glob("/media/nas/x21308/MTG_test_data/2022_05_MTG_Testdata/RC010[012]/*BODY*.nc"),
                reader="fci_l1c_nc", group_keys=["repeat_cycle_in_day"])

(at least if passing at most one day of data), but this (or something be default for FCI and is not obvious to a (novice) user.

Maybe we need a way to specify different defaults for group_files for different readers, perhaps in the reader YAML configuration?

djhoese commented 1 year ago

You can see how group_keys is customized for ABI here:

https://github.com/pytroll/satpy/blob/9c960e5b14917bd856b68e3d772505871329700c/satpy/etc/readers/abi_l1b.yaml#L21

Those aren't currently customized for FCI so I think it defaults to start_time:

https://github.com/pytroll/satpy/blob/main/satpy/etc/readers/fci_l1c_nc.yaml

I would have thought this would have been caught by @strandgren and @ameraner's SIFT work since I think SIFT uses group files.

ameraner commented 1 year ago

SIFT uses the kwarg group_keys as well for the group_files call, and the key to be used there is configured in the SIFT reader configs (currently repeat_cycle_in_day, as Gerrit uses), so it works.

djhoese commented 1 year ago

Was it set this way in SIFT as a quick workaround or is there a reason group_keys shouldn't be set this way in the Satpy reader?

ameraner commented 1 year ago

In this specific FCI case, indeed there is no reason why it shouldn't be set in the Satpy reader yaml, in contrary, it would be clearly useful. It would be even nicer to be able to group by day as well, but that would require to somehow split one of the datetime fields in the filename... but the RC number is a good start.

Regarding why the group keys are defined externally in SIFT: good question. Maybe there are use cases where the Satpy ones are too relaxed.. or maybe it was a simple way to customise the group keys in case they were not available in the Satpy config at the time. In any case, we could think of using the default Satpy config, unless SIFT overwrites them in its own configs.

djhoese commented 1 year ago

Regarding why the group keys are defined externally in SIFT: good question. Maybe there are use cases where the Satpy ones are too relaxed.. or maybe it was a simple way to customise the group keys in case they were not available in the Satpy config at the time. In any case, we could think of using the default Satpy config, unless SIFT overwrites them in its own configs.

Yes. I wasn't trying to suggest that anything was being done incorrectly or in some not preferred way. I understand SIFT development needing to move quickly and not making the updates to upstream Satpy. I also understand that SIFT may have grouping requirements/defaults that don't necessarily make sense for Satpy most common use cases. This FCI case is at least one case where Satpy definitely needs an update to at least work, whether it matches SIFT doesn't really matter.