unit8co / darts

A python library for user-friendly forecasting and anomaly detection on time series.
https://unit8co.github.io/darts/
Apache License 2.0
8.07k stars 880 forks source link

Lags in the covariates but not in the target for time series prediction #1648

Closed alejandrogomez97 closed 1 year ago

alejandrogomez97 commented 1 year ago

Hi.

I am trying to build a predictor for the remaining time of industrial processes with darts. I have variables such as the temperature of the machines, the pressure in some key points, etc. that have an influence over the remaining time of the current process. Those variables are time series that i could use as past covariates. However, my target is a peculiar variable.

My target is the remaining time. This is a variable that decreases by one unit of time each step until it reaches 0, which means that the current process has finished and the next one begins, so that my variable remaining time takes a new value. An example is shown in the dataframe of the image, where the remaining time of the process is called 'lifetime' and the two first variables are temperature and pressure: image

As you can see, I can't use past values of my target because those past values have information of the future, exactly the information I want to predict. If I knew that 2 minutes ago the remaining time of the process was 287 minutes and that 1 minute ago the remaining time of the process was 286 minutes, then obviously now the remaining time is 285. To predict that the remaining time now is 285 I can't use the previous 286 and 287.

To sum up, I would need to use my covariates as past time series covariates which i can look a few periods back, but I can't use any information about the last values of my target. Otherwise I would be cheating. How can I do this with darts? In the regression model i've seen that i can give "lags_past_covariates" that are about the covariates and "lags" that are about the target so i could say something like LinearRegressionModel(lags=None, lags_past_covariates=4). I was wondering if I could do something similar with a tft or a recurrent neural network.

The idea behind this is to be able to estimate the probability of the remaining time being in an interval. I know that if the duration of the process deviates more than 10% from the average, we have a failure in the process. To compute the risk of failure I would have to compute the probability of the remaining time being out of the correct interval. I thought I could use a tft to estimate 100 times the remaining time of each point and see how many estimations are in the correct interval and how many are out. However while using the fit method i don't see any 'lags' or 'lags_past_covariates' parameter. Is it possible to solve this problem with a tft or any neural network that allows me to follow this path?

alejandrogomez97 commented 1 year ago

By the way, thank you in advance for your help and for the great job you're doing.

dennisbader commented 1 year ago

Hey @alejandrogomez97 and thanks for your patience.

alejandrogomez97 commented 1 year ago

Ok, thanks. TorchForecastingModels (TFTModel, RNN, ...) do not support training without the target series as input in Darts or they don't support it either in Torch?

On the other hand, the models that you mention as catboost, lightgbm and XGBoost or even a linear regression are usually deterministic models, which means that the output is unique, based on the weights or the combination of trees. How are they implemented to generate stochastic examples, for example lightgbm?

Thanks for your attention.

dennisbader commented 1 year ago

Our TorchForecastingModels (deep learning models in Darts) do not support it.

You can make the model probabilistic with the likehood model creation parameter. See the docs here. For example Darts' LinearRegressionModel will use sklearn's QuantileRegressor or PoissonRegressor under the hood.