scikit-learn / scikit-learn

scikit-learn: machine learning in Python
https://scikit-learn.org
BSD 3-Clause "New" or "Revised" License
58.81k stars 25.13k forks source link

DOC replace MAPE in lagged features example #27880

Open lorentzenchr opened 7 months ago

lorentzenchr commented 7 months ago

A few improvements could be made on the new example of #25350:

rprkh commented 3 months ago

Hi @lorentzenchr, I would like to contribute to this issue.

  • Mean absolute percentage error (MAPE) is used quite a lot. I propose to replace it, in particular if predicting/forecasting the mean value. Note that MAPE is optimized by the median of a distribution with pdf propotional to $\frac{f(y)}{y}$, where $f(y)$ is the pdf of the true distribution of the data.

With respect to this, what alternate evaluation metric would you recommend replacing MAPE with? Also, do you suggest that we should replace MAPE with another metric within the scoring dictionary below?

https://github.com/scikit-learn/scikit-learn/blob/f59c16de7b0ca7b9db59113eb2112f03842bfde1/examples/applications/plot_time_series_lagged_features.py#L193-L200

  • The pinball_loss_50 is the same as 1/2 MAE, this redundancy could be removed.
  • A residual vs predicted does note really make sense for 5%- and 95%-quantile prediction. A reliability diagram for quantiles might be a good replacement, see model-diagnostics plot_reliability_diagram. Note that this is not possible within current scikit-learn. Maybe the best next action is to add a little more explanation to the graphs.

With regards to these suggestions, I will work on incorporating the required changes to the example.