Closed ameraner closed 4 weeks ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 95.94%. Comparing base (
4faccbb
) to head (98993a4
). Report is 112 commits behind head on main.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Totals | |
---|---|
Change from base Build 8787439081: | 0.0% |
Covered Lines: | 0 |
Relevant Lines: | 0 |
i agree with @ameraner and have discussed this issue with him earlier, so i'm good with this change.
I fully support that this should be true by default. We are already modifying the data in most readers, because by default, we use calibration=radiance
or calibration=brightness_temperature
, not calibration=counts
, which would be the data as they are in the file.
But I'm not sure if this should be a reader kwarg, or a DataID key (like for calibration or polarisation). Probably I'm rather late with this thought because the gridding feature already exists, just isn't activated by default.
activating the automatic regridding by default avoids the risk of users resampling the 1-d points to a 2-d grid with methods that can introduce artifacts (e.g. nearest or bilinear that introduce "pixel bleeding" effects)
How big of an issue do you think the pixel bleeding will be when resampling the 2D array to a local area definition? Or should that be ideally done from the non-gridded data (LFL)?
How big of an issue do you think the pixel bleeding will be when resampling the 2D array to a local area definition? Or should that be ideally done from the non-gridded data (LFL)?
That is indeed a good point, and it's a bit tricky... My understanding:
The accumulated products show the extent of the measured lightning events in the FCI grid. Therefore, ideally, one would stay in the native geostationary projection (and possibly use the native
resampler to up/downscale the grid if needed) to avoid resampling artefacts. If the projection needs to be changed, any of the existing resampling algorithms would end up changing the area covered by the lightnings eventually (since we map pixels between grids using only the pixel center position, and not the pixel footprint area - correct me if I'm wrong here). The bucket resampler would do the best job, in that at least it avoids the pixel bleeding effect, which could artificially increase the area covered by the lightning, possibly causing a wrong interpretation of the scene.
For the point products: the groups and flashes products represent spatial and temporal clusters of events; however, we only get one lat-lon coordinate for each group or flash (i.e. the average location of the group/flash) hence we anyway lose the spatial extent information. For this reason, I think that also in this case the bucket resampler does the best job, since it maps one group/flash to exactly one pixel in the target grid, so-to-say maintaining the "lat-lon marker" functionality.
I hope this is understandable/makes some sense. Otherwise I'm happy to discuss this further with examples and in front of a flipchart, e.g. at the user days :) What is sure is that there's still a lot to be experimented with using this data!
Thanks @ameraner for the insight! I just today got my MTG/LI processing chain completely functional (needed the new Satpy version for S3 support), and I'm using resampler='bucket_count'
to aggregate the LFL data to a 4 km grid. I'm putting the resulting images to WMS so they can be used as an overlay with any data.
I'll include my LI setup in my PUD presentation in June.
@pnuu it can be really significant, we see this when visualising GII BUFR data, which is essentially "point data" and needs to be put in an MSG grid before viewing the result. i did the first experiments with "nearest neighbor" resampling and it all depends on the search radius... too big and we get round blobs of data bleeding to neighboring pixels, too small and we get gaps in the data. bucket resampling does the best job, or simply converting the given lonlats first to rows and cols and use those to index the data points to correct location. we know from encoding the GII binary (gridded) product to BUFR that each GII data point has only one location where it should go.
This PR changes the default behaviour of the LI L2 reader, such that the accumulated products are delivered automatically as 2-d arrays.
Background and current state: The LI L2 products consist of two types of products:
Now, both products are actually stored in the NetCDF files as 1-d arrays. In the case of the accumulated products, each entry in the 1-d array corresponds to a specific pixel in the FCI grid (i.e. the corresponding column and row number are given in the file).
The mapping of these points onto the FCI grid, to obtain a 2-d array with an
AreaDefinition
, is already implemented in the reader, but needs to be activated by passing the reader kwargwith_area_definition=True
. The default value of this is currently set toFalse
. This is because, during the first implementation of the reader, it was decided to follow the rationale of "the reader shall return data as it's stored in the file and not modify it".Why do we want to change this (set
with_area_definition=True
as default)? The reasons are multiple:reader_kwargs
through theScene
object is a rather advanced operation, and requires the user to fully find and understand the reader documentation, which is not always trivial. It also introduces an extra step that is otherwise not required in the standard Satpy API usage.Scene
reader_kwargs
interface.For these reasons, we think that this is a good exception to the "reader doesn't modify data" guideline, since it applies an operation to the data that is anyway expected to be performed for the product to be usable, improving the user-friendliness.
Note that LI L2 accumulated products have not been widely distributed so far, and the most attention for these will start with the LI pre-operational dissemination (hopefully in a couple of months time), so we think it's a good time to introduce this now.