Activate LI L2 accumulated products gridding by default

ameraner commented 2 months ago

This PR changes the default behaviour of the LI L2 reader, such that the accumulated products are delivered automatically as 2-d arrays.

Background and current state: The LI L2 products consist of two types of products:

point products (LE, LEF, LFL, LGR): "classic" lightning products consisting of values with a lat-lon
accumulated (and gridded) products (AF, AFR, AFA): these products are the result of temporal accumulation (e.g. 30 seconds), and have been regridded to the FCI 2km grid by the ground processor, in order to facilitate the synergistic usage together with FCI.

Now, both products are actually stored in the NetCDF files as 1-d arrays. In the case of the accumulated products, each entry in the 1-d array corresponds to a specific pixel in the FCI grid (i.e. the corresponding column and row number are given in the file).

The mapping of these points onto the FCI grid, to obtain a 2-d array with an AreaDefinition, is already implemented in the reader, but needs to be activated by passing the reader kwarg with_area_definition=True. The default value of this is currently set to False. This is because, during the first implementation of the reader, it was decided to follow the rationale of "the reader shall return data as it's stored in the file and not modify it".

Why do we want to change this (set with_area_definition=True as default)? The reasons are multiple:

the accumulated products are designed and computed to be used in the grid. It is more intuitive for a user if the reader returns an accumulated product already in the 2-d grid, ready to be used, rather than a 1-d array that needs remapping in a specific way.
activating the automatic regridding by default avoids the risk of users resampling the 1-d points to a 2-d grid with methods that can introduce artifacts (e.g. nearest or bilinear that introduce "pixel bleeding" effects)
the usage of reader_kwargs through the Scene object is a rather advanced operation, and requires the user to fully find and understand the reader documentation, which is not always trivial. It also introduces an extra step that is otherwise not required in the standard Satpy API usage.
It simplifies the usage and combination with FCI, e.g. by allowing the generation of multi-sensor composites out-of-the box with the same handling of the FCI data.
Advanced users that would like to keep the data as 1-d arrays for further customised processing, can of course still do so via the Scene reader_kwargs interface.

For these reasons, we think that this is a good exception to the "reader doesn't modify data" guideline, since it applies an operation to the data that is anyway expected to be performed for the product to be usable, improving the user-friendliness.

Note that LI L2 accumulated products have not been widely distributed so far, and the most attention for these will start with the LI pre-operational dissemination (hopefully in a couple of months time), so we think it's a good time to introduce this now.

[x] Tests added
[x] Fully documented

codecov[bot] commented 2 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 95.94%. Comparing base (4faccbb) to head (98993a4). Report is 112 commits behind head on main.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #2789 +/- ## ========================================== + Coverage 95.93% 95.94% +0.01% ========================================== Files 377 379 +2 Lines 53541 53693 +152 ========================================== + Hits 51362 51515 +153 + Misses 2179 2178 -1 ``` | [Flag](https://app.codecov.io/gh/pytroll/satpy/pull/2789/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pytroll) | Coverage Δ | | |---|---|---| | [behaviourtests](https://app.codecov.io/gh/pytroll/satpy/pull/2789/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pytroll) | `4.09% <0.00%> (-0.01%)` | :arrow_down: | | [unittests](https://app.codecov.io/gh/pytroll/satpy/pull/2789/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pytroll) | `96.04% <100.00%> (+0.01%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=pytroll#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

coveralls commented 2 months ago

Pull Request Test Coverage Report for Build 8802581731

Details

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 0.0%

Totals
Change from base Build 8787439081:	0.0%
Covered Lines:	0
Relevant Lines:	0

💛 - Coveralls

sjoro commented 2 months ago

i agree with @ameraner and have discussed this issue with him earlier, so i'm good with this change.

gerritholl commented 2 months ago

I fully support that this should be true by default. We are already modifying the data in most readers, because by default, we use calibration=radiance or calibration=brightness_temperature, not calibration=counts, which would be the data as they are in the file.

But I'm not sure if this should be a reader kwarg, or a DataID key (like for calibration or polarisation). Probably I'm rather late with this thought because the gridding feature already exists, just isn't activated by default.

pnuu commented 2 months ago

activating the automatic regridding by default avoids the risk of users resampling the 1-d points to a 2-d grid with methods that can introduce artifacts (e.g. nearest or bilinear that introduce "pixel bleeding" effects)

How big of an issue do you think the pixel bleeding will be when resampling the 2D array to a local area definition? Or should that be ideally done from the non-gridded data (LFL)?

ameraner commented 2 months ago

How big of an issue do you think the pixel bleeding will be when resampling the 2D array to a local area definition? Or should that be ideally done from the non-gridded data (LFL)?

That is indeed a good point, and it's a bit tricky... My understanding:

The accumulated products show the extent of the measured lightning events in the FCI grid. Therefore, ideally, one would stay in the native geostationary projection (and possibly use the native resampler to up/downscale the grid if needed) to avoid resampling artefacts. If the projection needs to be changed, any of the existing resampling algorithms would end up changing the area covered by the lightnings eventually (since we map pixels between grids using only the pixel center position, and not the pixel footprint area - correct me if I'm wrong here). The bucket resampler would do the best job, in that at least it avoids the pixel bleeding effect, which could artificially increase the area covered by the lightning, possibly causing a wrong interpretation of the scene.

For the point products: the groups and flashes products represent spatial and temporal clusters of events; however, we only get one lat-lon coordinate for each group or flash (i.e. the average location of the group/flash) hence we anyway lose the spatial extent information. For this reason, I think that also in this case the bucket resampler does the best job, since it maps one group/flash to exactly one pixel in the target grid, so-to-say maintaining the "lat-lon marker" functionality.

I hope this is understandable/makes some sense. Otherwise I'm happy to discuss this further with examples and in front of a flipchart, e.g. at the user days :) What is sure is that there's still a lot to be experimented with using this data!

pnuu commented 2 months ago

Thanks @ameraner for the insight! I just today got my MTG/LI processing chain completely functional (needed the new Satpy version for S3 support), and I'm using resampler='bucket_count' to aggregate the LFL data to a 4 km grid. I'm putting the resulting images to WMS so they can be used as an overlay with any data.

I'll include my LI setup in my PUD presentation in June.

sjoro commented 2 months ago

@pnuu it can be really significant, we see this when visualising GII BUFR data, which is essentially "point data" and needs to be put in an MSG grid before viewing the result. i did the first experiments with "nearest neighbor" resampling and it all depends on the search radius... too big and we get round blobs of data bleeding to neighboring pixels, too small and we get gaps in the data. bucket resampling does the best job, or simply converting the given lonlats first to rows and cols and use those to index the data points to correct location. we know from encoding the GII binary (gridded) product to BUFR that each GII data point has only one location where it should go.

pytroll / satpy