thieu1995 / permetrics

Artificial intelligence (AI, ML, DL) performance metrics implemented in Python
https://permetrics.readthedocs.io/en/latest/
GNU General Public License v3.0
67 stars 17 forks source link

[BUG]: 'y_true' and 'y_pred' with just 1 value #7

Open wasf84 opened 2 months ago

wasf84 commented 2 months ago

Description of the bug

Hi. First of all, thanks for that work. It's helping me a lot with my personal project.

I've noticed that when 'y_true' and 'y_pred' both has just 1 single value the code crashes.

I'm working on rainfall-runoff modeling to forecast few days ahead. When I try to forecast just 1 day ahead, it crashes during evaluation step.

Environment: Windows 11 Python 3.9.7 Permetrics 2.0.0

Thanks again for your attention.

Steps To Reproduce

import numpy as np from permetrics import RegressionMetric

y_true = np.array([3]) y_pred = np.array([2.5])

evaluator = RegressionMetric()

rmse_1 = evaluator.RMSE(y_true, y_pred) rmse_2 = evaluator.root_mean_squared_error(y_true, y_pred) print(f"RMSE: {rmse_1}, {rmse_2}")

mse = evaluator.MSE(y_true, y_pred) mae = evaluator.MAE(y_true, y_pred) print(f"MSE: {mse}, MAE: {mae}")

Additional Information

The output cell

{ "name": "IndexError", "message": "tuple index out of range", "stack": "--------------------------------------------------------------------------- IndexError Traceback (most recent call last) Cell In[237], line 10 7 evaluator = RegressionMetric() 9 ## 3.1 Call specific function inside object, each function has 2 names like below ---> 10 rmse_1 = evaluator.RMSE(y_true, y_pred) 11 rmse_2 = evaluator.root_mean_squared_error(y_true, y_pred) 12 print(f\"RMSE: {rmse_1}, {rmse_2}\")

File lib\site-packages\permetrics\regression.py:237, in RegressionMetric.root_mean_squared_error(self, y_true, y_pred, multi_output, force_finite, finite_value, kwargs) 222 def root_mean_squared_error(self, y_true=None, y_pred=None, multi_output=\"raw_values\", force_finite=True, finite_value=1.0, kwargs): 223 \"\"\" 224 Root Mean Squared Error (RMSE): Best possible score is 0.0, smaller value is better. Range = [0, +inf) 225 (...) 235 result (float, int, np.ndarray): RMSE metric for single column or multiple columns 236 \"\"\" --> 237 y_true, y_pred, n_out = self.get_processed_data(y_true, y_pred) 238 result = np.sqrt(np.mean((y_true - y_pred) ** 2, axis=0)) 239 return self.get_output_result(result, n_out, multi_output, force_finite, finite_value=finite_value)

File lib\site-packages\permetrics\regression.py:119, in RegressionMetric.get_processed_data(self, y_true, y_pred, **kwargs) 108 \"\"\" 109 Args: 110 y_true (tuple, list, np.ndarray): The ground truth values (...) 116 n_out: Number of outputs 117 \"\"\" 118 if (y_true is not None) and (y_pred is not None): --> 119 y_true, y_pred, n_out = du.format_regression_data_type(y_true, y_pred) 120 else: 121 if (self.y_true is not None) and (self.y_pred is not None):

File lib\site-packages\permetrics\utils\data_util.py:22, in format_regression_data_type(y_true, y_pred) 20 if y_true.ndim > 2: 21 raise ValueError(\"y_true and y_pred must be 1D or 2D arrays.\") ---> 22 return y_true, y_pred, y_true.shape[1] # n_outputs 23 else: 24 raise ValueError(\"y_true and y_pred must have the same number of dimensions.\")

IndexError: tuple index out of range" }

wasf84 commented 2 months ago

"I'm sorry. I forgot to mention that this error occurs with any of the metrics, not just the ones I mentioned here."

thieu1995 commented 1 month ago

@wasf84, Yes, of course I will crash for any metrics. What is the point of using just 1 value to calculate metric? There are several metrics need to calculate the mean. With 1 value, you can't calculate the mean. I think you should get enough data before calculate metrics. If you forecast few days ahead. Then spend some more days to get data first then calculate its metrics later.

wasf84 commented 1 month ago

Hi @thieu1995

I was trying to implement a Walf Forward Validation using this paper: https://www.sciencedirect.com/science/article/pii/S259012302400358X?via%3Dihub

I'm forecasting 1 day ahead, calculating the metrics, merging the results in a DataFrame, and so on, through the year of 2023. But I'll try to do what you said, getting more data and calculating the metrics at the end of experiment.

Thank you.