Open cboettig opened 2 years ago
Hi @cboettig and many thanks for the feedback. This sounds highly relevant. Currently the only metric applying to probabilistic forecasts is the "rho-risk". It'd be nice to support more scoring metrics as you suggest. I would also like to think about this development in terms of anomaly detection (which we want to eventually bring in Darts too) - i.e., metrics that help to quantify "how unlikely" some time series realisation is compared to a forecast would be helpful to detect anomalies.
Overall I think we would need scores that (i) work at least on univariate series (i.e., not limited to isolated scalars) and potentially multivariate too; and (ii) that work with empirical probability distributions specified as Monte Carlo samples only (as that's what we have in Darts).
My guess is that the most pragmatic way to start would be to try adding 1-2 more metrics functions in darts.metrics
. I'm not very knowledgeable on this topic though, so not sure yet which one(s) exactly to start with (any suggestion is welcome), but will try to investigate this. Of course we also welcome PRs, even if they are authored by R developers :)
@julien12234
Hi! Just wanted to ping this issue, has there been progress on this? I'm starting to work with stochastic forecasting and it would improve my workflow if Darts' API included metrics designed for this. Let me know if there's anything I can do to help make this happen!
One can also use this implementation: https://docs.pyro.ai/en/stable/ops.html#pyro.ops.stats.crps_empirical
Is your feature request related to a current problem? Please describe.
Darts is a fantastic library with excellent support for probability-based forecasts, which are essential for things like risk pricing and decision-making. Darts also has a very nice workflow for assessing forecast skill across a range of potential metrics. However, the current list of metrics does not seem to include metrics designed to score probablistic forecasts, particularly, scores that satisfy the "strictly proper" criterion of Gneiting and Raferty.
Describe proposed solution
It would be wonderful to have common strictly proper scoring metrics such as CRPS or log probability score, or Brier score available directly in darts. Several python implementations appear to exist, such as properscoring module or the CRPS module, though the equations are relatively simple to implement directly and I have no idea how optimized the performance of the existing implementations may be. (A somewhat more extensive collection of probablistic scoring rules can be found in the R community, e.g. scoringRules note most of that implementation is C++ thinly wrapped with R so might be helpful).
CRPS is particularly of interest as it generalizes nicely to ensemble-based probabilistic forecasts and can also be used to score point-predictions. It is probably the most widely used measure.
Describe potential alternatives It looks like it may be possible for users to 'bring their own metrics', given how well-designed the DARTS API is. Still I think it would benefit users to have at least one probablistic score built in.
Additional context Please let me know if I can provide any further information that might be helpful. I'm primarily an R developer so I am afraid my python is not up to the level of authoring a PR, but happy to provide what input I can.