dnerini commented 3 years ago

Resolve #196

[x] Complete documentation
[x] Add unit tests
[x] Add example

RDT: https://pysteps.readthedocs.io/en/probability-forecasts/ Example: https://pysteps.readthedocs.io/en/probability-forecasts/auto_examples/probability_forecast.html#sphx-glr-auto-examples-probability-forecast-py

codecov[bot] commented 3 years ago

Codecov Report

Merging #207 (80bf364) into master (76324d8) will increase coverage by 0.96%. The diff coverage is 95.29%.

:exclamation: Current head 80bf364 differs from pull request most recent head b21d9a0. Consider uploading reports for the commit b21d9a0 to get more accurate results

@@            Coverage Diff             @@
##           master     #207      +/-   ##
==========================================
+ Coverage   78.01%   78.97%   +0.96%     
==========================================
  Files         131      135       +4     
  Lines        9640     9890     +250     
==========================================
+ Hits         7521     7811     +290     
+ Misses       2119     2079      -40

Flag	Coverage Δ
unit_tests	`78.97% <95.29%> (+0.96%)`	:arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
pysteps/io/importers.py	`71.65% <42.85%> (ø)`
pysteps/nowcasts/interface.py	`95.45% <100.00%> (+0.71%)`	:arrow_up:
pysteps/nowcasts/lagrangian_probability.py	`100.00% <100.00%> (ø)`
pysteps/tests/test_interfaces.py	`99.28% <100.00%> (ø)`
...teps/tests/test_nowcasts_lagrangian_probability.py	`100.00% <100.00%> (ø)`
pysteps/io/exporters.py	`52.35% <0.00%> (ø)`
pysteps/utils/images.py	`76.92% <0.00%> (ø)`
pysteps/tests/test_utils_interpolate.py	`100.00% <0.00%> (ø)`
pysteps/tests/test_decorators.py	`100.00% <0.00%> (ø)`
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 76324d8...b21d9a0. Read the comment docs.

dnerini commented 3 years ago

The notebook below shows how the new probabilistic nowcasting module works.

import matplotlib.pyplot as plt
import numpy as np

from pysteps.nowcasts.probability_forecast import forecast

Pysteps configuration file found at: /home/ned/.pysteps/pystepsrc

Numerical example

# parameters
precip = np.zeros((100, 100))
precip[10:50, 10:50] = 1
velocity = np.ones((2, *precip.shape))
timesteps = [0, 2, 6, 12]
nsamples = 20
thr = 0.5
slope = 1 # pixels / timestep
method = "gaussian_mv"

# compute probability forecast
out = forecast(precip, velocity, timesteps, slope, nsamples, thr, method)

# plot
for n, frame in enumerate(out):
    plt.subplot(2, 2, n + 1)
    plt.imshow(frame, interpolation="nearest", vmin=0, vmax=1)
    plt.title(f"t={timesteps[n]}")
    plt.xticks([])
    plt.yticks([])
plt.show()

plt.imshow(out[2], interpolation="nearest", vmin=0, vmax=1)
plt.title(f"t={timesteps[2]}")
plt.xticks([])
plt.yticks([])
plt.colorbar()
plt.show()

Real-data example

from datetime import datetime

from pysteps import io, rcparams, utils, nowcasts, rcparams
from pysteps.motion.lucaskanade import dense_lucaskanade

# data source
source = rcparams.data_sources["mch"]
root = rcparams.data_sources["mch"]["root_path"]
fmt = rcparams.data_sources["mch"]["path_fmt"]
pattern = rcparams.data_sources["mch"]["fn_pattern"]
ext = rcparams.data_sources["mch"]["fn_ext"]
timestep = rcparams.data_sources["mch"]["timestep"]
importer_name = rcparams.data_sources["mch"]["importer"]
importer_kwargs = rcparams.data_sources["mch"]["importer_kwargs"]

# read precip field
date = datetime.strptime("201607112100", "%Y%m%d%H%M")
fns = io.find_by_date(date, root, fmt, pattern, ext, timestep, num_prev_files=2)
importer = io.get_method(importer_name, "importer")
precip, __, metadata = io.read_timeseries(fns, importer, **importer_kwargs)
precip, metadata = utils.to_rainrate(precip, metadata)
precip[np.isnan(precip)] = 0

# motion
motion = dense_lucaskanade(precip)

# parameters
nleadtimes = 6
nsamples = 20
thr = 1 # mm / h
slope = 1.0 * timestep # km / min
method = "gaussian_mv"

# compute probability forecast
fct = forecast(precip[-1], motion, nleadtimes, slope, nsamples, thr, method)

# plot
for n, frame in enumerate(fct):
    plt.subplot(2, 3, n + 1)
    plt.imshow(frame, interpolation="nearest", vmin=0, vmax=1)
    plt.xticks([])
    plt.yticks([])
plt.show()

plt.imshow(fct[2], interpolation="nearest", vmin=0, vmax=1)
plt.xticks([])
plt.yticks([])
plt.colorbar()
plt.show()

from pysteps.verification import reldiag_init, reldiag_accum,plot_reldiag

# verifying observations
importer = io.get_method(importer_name, "importer")
fns = io.find_by_date(date, root, fmt, pattern, ext, timestep, num_next_files=nleadtimes)
obs, __, metadata = io.read_timeseries(fns, importer, **importer_kwargs)
obs, metadata = utils.to_rainrate(obs, metadata)
obs[np.isnan(obs)] = 0

# reliability diagram
reldiag = reldiag_init(thr)
reldiag_accum(reldiag, fct, obs[1:])
fig, ax = plt.subplots()
plot_reldiag(reldiag, ax)
ax.set_title("Reliability diagram")
plt.show()

/home/ned/PycharmProjects/pysteps/pysteps/verification/plots.py:160: MatplotlibDeprecationWarning: The 'basey' parameter of __init__() has been renamed 'base' since Matplotlib 3.3; support for the old name will be dropped two minor releases later.
  iax.set_yscale("log", basey=10)

dnerini commented 3 years ago

@loforest @pulkkins now would be a good time to get some feedback on this PR :-)

Would be nice to close this in a relatively short time, if you think that this looks reasonably ready.

dnerini commented 3 years ago

Updated example, after switching to the convolutional approach (commit a042a4e).

import matplotlib.pyplot as plt
import numpy as np

from pysteps.nowcasts.lagrangian_probability import forecast

Pysteps configuration file found at: /home/ned/.pysteps/pystepsrc

Numerical example

# parameters
precip = np.zeros((100, 100))
precip[10:50, 10:50] = 1
velocity = np.ones((2, *precip.shape))
timesteps = [0, 2, 6, 12]
thr = 0.5
slope = 1 # pixels / timestep

# compute probability forecast
out = forecast(precip, velocity, timesteps, thr, slope=slope)

# plot
for n, frame in enumerate(out):
    plt.subplot(2, 2, n + 1)
    plt.imshow(frame, interpolation="nearest", vmin=0, vmax=1)
    plt.title(f"t={timesteps[n]}")
    plt.xticks([])
    plt.yticks([])
plt.show()

plt.imshow(out[2], interpolation="nearest", vmin=0, vmax=1)
plt.title(f"t={timesteps[2]}")
plt.xticks([])
plt.yticks([])
plt.colorbar()
plt.show()

Real-data example

from datetime import datetime

from pysteps import io, rcparams, utils, nowcasts
from pysteps.motion.lucaskanade import dense_lucaskanade
from pysteps.verification import reldiag_init, reldiag_accum,plot_reldiag

# data source
source = rcparams.data_sources["mch"]
root = rcparams.data_sources["mch"]["root_path"]
fmt = rcparams.data_sources["mch"]["path_fmt"]
pattern = rcparams.data_sources["mch"]["fn_pattern"]
ext = rcparams.data_sources["mch"]["fn_ext"]
timestep = rcparams.data_sources["mch"]["timestep"]
importer_name = rcparams.data_sources["mch"]["importer"]
importer_kwargs = rcparams.data_sources["mch"]["importer_kwargs"]

# read precip field
date = datetime.strptime("201607112100", "%Y%m%d%H%M")
fns = io.find_by_date(date, root, fmt, pattern, ext, timestep, num_prev_files=2)
importer = io.get_method(importer_name, "importer")
precip, __, metadata = io.read_timeseries(fns, importer, **importer_kwargs)
precip, metadata = utils.to_rainrate(precip, metadata)
# precip[np.isnan(precip)] = 0

# motion
motion = dense_lucaskanade(precip)

# parameters
nleadtimes = 6
timesteps = [2, 4, 8, 16]
thr = 1 # mm / h
slope = 1 * timestep # km / min

# compute probability forecast
extrap_kwargs = dict(allow_nonfinite_values=True)
fct = forecast(precip[-1], motion, timesteps, thr, slope=slope, extrap_kwargs=extrap_kwargs)

# plot
for n, frame in enumerate(fct):
    plt.subplot(2, 3, n + 1)
    plt.imshow(frame, interpolation="nearest", vmin=0, vmax=1)
    plt.xticks([])
    plt.yticks([])
plt.show()

plt.imshow(fct[2], interpolation="nearest", vmin=0, vmax=1)
plt.xticks([])
plt.yticks([])
plt.colorbar()
plt.show()

# verifying observations
importer = io.get_method(importer_name, "importer")
fns = io.find_by_date(date, root, fmt, pattern, ext, timestep, num_next_files=len(timesteps))
obs, __, metadata = io.read_timeseries(fns, importer, **importer_kwargs)
obs, metadata = utils.to_rainrate(obs, metadata)
obs[np.isnan(obs)] = 0

# reliability diagram
reldiag = reldiag_init(thr)
reldiag_accum(reldiag, fct, obs[1:])
fig, ax = plt.subplots()
plot_reldiag(reldiag, ax)
ax.set_title("Reliability diagram")
plt.show()

/home/ned/PycharmProjects/pysteps/pysteps/verification/plots.py:160: MatplotlibDeprecationWarning: The 'basey' parameter of __init__() has been renamed 'base' since Matplotlib 3.3; support for the old name will be dropped two minor releases later.
  iax.set_yscale("log", basey=10)

RubenImhoff commented 3 years ago

Oh and final comment: maybe add a y-label to the colour bars in the example (label = "Exceedance probability of threshold x mm/h")

dnerini commented 3 years ago

Thanks @RubenImhoff and @loforest for the precious comments and inputs!

I tried to address them at my best, please let me know if you have more concerns of suggestions.

To answer the remaining points:

from @RubenImhoff :

Finally, I really like your examples, they make the method directly clear. I did miss the code in the reviewing process, though. I assume you're planning to add it to the examples gallery?

Good idea, I will!

from @loforest :

This is very nice contribution. Thanks! The spatial patterns of the convolution method look better than the "Gaussian sampling" method. Though, there is quite a difference in the reliability diagram. How do you interpret it?

Good question. The are substantial differences between the two approaches, one being the actual number of samples used to compute the probabilities (a fix number, 20, in the first case; while it is all the pixels within the neighborhood in the second case). You think this could partially explain the more pronounced underconfidence of the Gaussian sampling method? Also, please note that I used different leadtimes in the second case. The figure below is more comparable with the first one.

loforest commented 3 years ago

Form @dnerini:

The are substantial differences between the two approaches, one being the actual number of samples used to compute the probabilities (a fix number, 20, in the first case; while it is all the pixels within the neighborhood in the second case). You think this could partially explain the more pronounced underconfidence of the Gaussian sampling method?

It is difficult to say from a single case. It could be related to how the precipitation in the East part of the domain where there is an irregular radar mask is handled. In the Gaussian sampling example the mask is not present anymore in the nowcast.

I also wonder about the impact of using spatially correlated pixels to build the p.d.f. vs sampling points there are more independent.

dnerini commented 3 years ago

Thanks a lot @RubenImhoff @loforest for your reviews!

I'm merging this PR.

pySTEPS / pysteps

Probability forecasts #207

Codecov Report

Numerical example

Real-data example

Numerical example

Real-data example