unit8co / darts

A python library for user-friendly forecasting and anomaly detection on time series.
https://unit8co.github.io/darts/
Apache License 2.0
7.91k stars 857 forks source link

[BUG] LinearRegression get_estimator method's required arguments appear to do nothing #2494

Open a097123 opened 1 month ago

a097123 commented 1 month ago

Describe the bug

No matter what is passed into the get_estimator method of the LinearRegression class the same model object is returned.

To Reproduce

from darts import TimeSeries
from darts.models.forecasting.linear_regression_model import RegressionModel
from darts.utils.timeseries_generation import linear_timeseries

import numpy as np

trend = linear_timeseries(start_value=0, end_value=100, length=100)
noise = TimeSeries.from_times_and_values(
    trend.time_index, np.random.normal(0, 5, size=trend.values().shape)
)

series = trend + noise

model = RegressionModel(lags=3, output_chunk_length=2)
model.fit(series)

print(id(model.get_estimator(1, 1)))
print(id(model.get_estimator(2, 2)))
print(id(model.get_estimator(-1e7, -1e7)))
print(id(model.get_estimator(1e7, 1e7)))
13699865568
13699865568
13699865568
13699865568

Expected behavior My assumption was that a Darts model would build 1 underlying model per future time period, i.e. "direct forecasting". get_regressor takes 2 arguments:

    def get_estimator(self, horizon: int, target_dim: int):
        """Returns the estimator that forecasts the `horizon`th step of the `target_dim`th target component.

        The model is returned directly if it supports multi-output natively.

        Parameters
        ----------
        horizon
            The index of the forecasting point within `output_chunk_length`.
        target_dim
            The index of the target component.
        """

No matter the arguments passed in the method will... 1) return an object without an error, even if the inputs are ridiculous 2) return the same model object (not sure which horizon's model this is).

Both are unexpected to me.

System (please complete the following information):

Additional context New to darts.

a097123 commented 1 month ago

Update: I am now seeing that it is due to this if statements. It's currently unclear to me what makes a model object subclass MultiOutputRegressor.

I still feel that logging.info is invisible to almost all users and having require args that are meaningless is misleading.

madtoinou commented 1 month ago

Hi @a097123,

Darts relies on sklearn implementation for all the regression models. This MultiOuputRegressor class is implemented at this higher level, we do not have control over it but sometimes makes "single output" models supports multivariates series/ output_chunk_length > 1 series by wrapping them in this class.

In your code snippet, you implicitly use the sklearn's LinearRegression model which inherits from the MultiOutputMixin class. Because this model support multioutputs out of the box, following the information stated in the docstring, the model is directly returned (hence the unique ID).

If you look at model.model.coef_, you will see that there is one set of coefficient for each position in output_chunk_length (in accordance with your assumption since multi_models=True by default) but there is no straightforward way to access a specific estimator from the model.

We could move the sanity check one level in the method to avoid the unexpected behavior you reported.

Changing the logging message from info to warning is also a possibility, did not want to make this look too alarming but it seems to be counter-intuitive.

Note: Darts LinearRegressionModel class would be more appropriate if this is indeed the model you want to use since it supports some additional features such as probabilistic forecast.

a097123 commented 1 month ago

@madtoinou Thank you for the detailed response! This helps me understand what the class is doing much better. Also appreciate the suggestion of LinearRegressionModel and why it might be better.

I personally think that warning is better here but I am new to the lib and you might know better than me what a darts user might expect there. This might just be a weird edge case only a noob would hit before taking a more conventional approach.