metoppv / improver

IMPROVER is a library of algorithms for meteorological post-processing.
http://improver.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
103 stars 85 forks source link

Correct use of NeighbourhoodProcessing #1893

Closed ecasellas closed 1 year ago

ecasellas commented 1 year ago

Hello,

First of all, thank you for sharing this Python package, it's extremely useful! At the Meteorological Service of Catalonia we built a probability blending scheme based on Improver tools (neighbourhood, blending and calibration). Now, we're trying to simplify the code and I came across the lead_times parameter (I don't remember if it was available in previous versions) in the NeighbourhoodProcessing class. I have a doubt regarding how to properly use this class.

square_nbhood = NeighbourhoodProcessing(neighbourhood_method='square',
                                        radii=nbhood_sizes,
                                        lead_times=list(cube.coord('forecast_period').points.data))

where

nbhood_sizes = [1000, 2000, 3000, 4000, 5000, 6000]
lead_times = [1, 2, 3, 4, 5, 6]

When I call square_nbhood.process(cube) where cube is <iris 'Cube' of probability_of_precipitation_amount_above_threshold / (1) (time: 6; projection_y_coordinate: 688; projection_x_coordinate: 688)>, I get the following error:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/nowcasting/.conda/envs/lup/lib/python3.10/site-packages/improver/nbhood/nbhood.py", line 376, in process
    check_radius_against_distance(cube, self.radius)
  File "/home/nowcasting/.conda/envs/lup/lib/python3.10/site-packages/improver/nbhood/nbhood.py", line 72, in check_radius_against_distance
    if radius > max_allowed:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

If I call square_nbhood.process(cube) using:

for cube_slice in cube.slices([cube.coord(axis="y"), cube.coord(axis="x")]):
    square_nbhood.process(cube_slice)

Everything seems fine. So my doubt is, the correct way to implement NeighbourhoodProcessing with multiple lead_times is the second one? Because, if I check the source code for BaseNeighbourhoodProcessing (lines 389-395 of improver.improver.nbhood.nbhood.py) it already does the slicing I do before calling square_nbhood.process(cube), so I don't see why I have to previously slice the cube.

result_slices = CubeList()
for cube_slice in cube.slices([cube.coord(axis="y"), cube.coord(axis="x")]):
    cube_slice.data = self._calculate_neighbourhood(
        cube_slice.data, mask_cube_data
    )
    result_slices.append(cube_slice)
neighbourhood_averaged_cube = result_slices.merge_cube()

I hope I've explained myself well, please let me know if I didn't. Thank you very much!

bayliffe commented 1 year ago

Hi Enric,

Apologies for the slow response, but we've had a long weekend here in the UK for Easter. You've hit upon an issue that we don't encounter because of the way we use IMPROVER, but it remains something that it would be nice to correct.

The issue

Our processing using IMPROVER always uses files that contain a single validity time, i.e. there is not a leading time coordinate on the cubes. Most plugins will handle the case of multiple times in a single file, but it seems that neighbourhood processing is not amongst them, with this line being the reason, along with a lack of later handling.

The _find_radii function is calculating the radius to use for the cube's lead-time. For example, we might set up the plugin like this:

square_nbhood = NeighbourhoodProcessing(
    neighbourhood_method='square',
    radii=[10000, 20000],  # Units of metres
    lead_times=[0, 10]  # Units of hours
)

and then provide a cube with a lead-time of 5 hours (T+5). This function will calculate that the radius to use for neighbourhood processing should be 15000m, assuming a linear change in radius with lead-time.

In the case where a cube is provided with multiple times, as in your example, this function will return multiple radii that correspond to each of the relevant lead-times. For example:

cube lead-times: 0, 5, 10
returned radii: 10000, 15000, 20000

The plugin currently then assigns this to self.radius, our use of the singular "radius" here showing our assumption that this will be a single value. When a check is made that the radius is not too large it assumes a single value and it falls over with the error you are seeing.

The immediate solution

The immediate workaround for your case is to slice over the time coordinate, calling the neighbourhooding plugin for each slice, and then recombine the data into a single cube. This is essentially what you've shown above, slicing over the x-y coordinates outside the plugin. Note that the slicing that already exists within the plugin (which you highlight) is to handle multiple threshold values assuming there is a single time; the x-y slices are effectively slicing over a threshold coordinate.

A variation on your own solution is this:

square_nbhood = NeighbourhoodProcessing(
    neighbourhood_method='square',
    radii=[10000, 20000],  # Units of metres
    lead_times=[0, 10]  # Units of hours
)

result_slices = CubeList()
for cube_slice in cube.slices_over("time"):
    result_slices.append(
        square_nbhood.process(cube_slice)
    )
neighbourhood_averaged_cube = result_slices.merge_cube()

A better solution

This issue ought to be resolved to allow the neighbourhood plugin to handle multiple times. I've created the following issue for this to be done, though it is not high-priority work for us. If you felt it was something you could do, we always welcome contributions, but otherwise it will be done at some unknown future date.

https://github.com/metoppv/improver/issues/1894

I hope that's helpful and describes why you are seeing this issue. Let me know if you have any further questions, and if not, please close this issue.

Regards, Ben

ecasellas commented 1 year ago

Hi Ben,

No worries at all! We also had a long weekend here in Catalonia.

Thank you very much for all your explanations and your rapid response, it was very helpful. It's clear now for me how to use the current NeighbourhoodProcessing class and I'll adopt the immediate solution you propose. However, I'll also try to provide a solution for multi-time cubes, I don't know if I'll manage to do it, we'll see.

Thank you again,

Regards, Enric