KoustavDS closed 11 months ago

KoustavDS commented 11 months ago


I am trying to run a global model (NBEATS/TFT) with 5000 time-series having 365 timestamp each.Can use the below code to create the data. I am running this in 2 different GCP instances.

Instance 1 configuration : 16 vCPUs, 60 GB RAM, NVIDIA T4 (1 GPU).

I am getting "out of memory" error(copying below). As suggested in the error message, I tried to max_split_size_mb = 128,256 etc. but it did not resolve the problem

OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 14.58 GiB total capacity; 14.36 GiB already allocated; 1.31 MiB free; 14.44 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Instance 2 configuration : 16 vCPUs, 60 GB RAM, NVIDIA V100 (4 GPU).

It is not throwing "out of memory" error but giving error related to Pytorch lightning GPU allocation(copying below).

*RuntimeError: Lightning can't create new processes if CUDA is already initialized. Did you manually call `torch.cuda.` functions, have moved the model to the device, or allocated memory on the GPU any other way? Please remove any such calls, or change the selected strategy. You will have to restart the Python kernel.**

Code to reproduce the data :

import pandas as pd import numpy as np from darts.dataprocessing import Pipeline from darts.metrics import mape, smape, rmse from darts.utils.statistics import check_seasonality, plot_acf, plot_residuals_analysis from darts.utils.timeseries_generation import linear_timeseries from darts.datasets import MonthlyMilkDataset, MonthlyMilkIncompleteDataset from darts.models import NBEATSModel

from statsmodels.tools.eval_measures import rmse

from sklearn.preprocessing import MaxAbsScaler

new_arry = np.array(np.random.randint(1000,size=(365,5000))) new_data = pd.DataFrame(new_arry) newdata.columns = ['col' + str(i) for i in new_data.columns] new_data['col_dt'] = pd.date_range(start='1/1/2022', periods=len(new_data), freq='D')

pvt_samp2 = TimeSeries.from_dataframe(new_data,'col_dt',new_data.columns[:5000].tolist())

filler = MissingValuesFiller(fill = 'auto') pvt_samp2 = filler.transform(pvt_samp2)

transformer = Scaler(scaler=MaxAbsScaler()) pvt_samp2 = transformer.fit_transform(pvt_samp2)

new_cov = new_data.copy() new_cov['day'] = new_cov.col_dt.dt.day new_cov['month'] = new_cov.col_dt.dt.month

new_cov = new_cov[['col_dt','day','month']] new_cov = TimeSeries.from_dataframe(new_cov,'col_dt',new_cov.columns[1:].tolist()) scaler_dt_cov = Scaler() final_cov = scaler_dt_cov.fit_transform(new_cov)

train, val = pvt_samp2.split_after(pd.Timestamp("20221221")) train_cov, val_cov = final_cov.split_after(pd.Timestamp("20221221"))

from darts.models import TFTModel import torch

my_model = TFTModel( input_chunk_length=90, output_chunk_length=10, hidden_size=32, lstm_layers=1, num_attention_heads=3, dropout=0.2, batch_size=300, n_epochs=4, add_relative_index=False, add_encoders=None, likelihood=None,


#    quantiles=quantiles
#),  # QuantileRegression is set per default


my_model.fit(train, future_covariates=final_cov, verbose=True)

Expected behavior

How to resolve these errors. Am I passing too much data processing for 1 GPU? In that case, I am adding more GPUs bu getting error Pytorch lightning error. How to solve this issue. Suggestions to scale from 5000 to 10000 timeseries in one model?

System (please complete the following information):

python version : 3.10 Darts version : 0.25.0

KoustavDS commented 11 months ago

Hi Unitco team, request your help to solve the above issue.

dennisbader commented 11 months ago

Hi @KoustavDS and sorry for the late response. It looks like you created a multivariate target series (the series you want to make forecasts for) with 5000 components (columns).

This will end up creating a multivariate TFTModel with 5000 output dimensions - a huge model which is likely why you end up running into memory issues.

I believe what you want to do is create a univariate TFTModel (one output dimension) and train it on 5000 univariate (one column) target series. For this you just have to create a list of time series with one column each, and then feed this list to model.fit() and predict().

You can read more about difference between multivariate and multiple series in this guide. And here an example for forecasting with multiple series.

KoustavDS commented 11 months ago

I will try this approach and get back. Please do let me know if I want to add exogenous feature(covariate) like weekday, should I need to create one exogenous feature list for each time-series and pass on? In that case I need to pass 5000 covariate series.

dennisbader commented 11 months ago

Yes, thats correct. One covariate series per target series. The covariates themselves can be multivariate (e.g each of the 5000 covariate series can have mutliple colums/features such as weekday, month, …)

KoustavDS commented 11 months ago

Thank you. Also it would be great if you kindly explain me this --> what is the difference while we are running a set of 5K series in a dataframe (multivariate) and running 5K individual series in NBEATs or TFT.

What I understood is, it will break the data as per input_chunk and output_chunk and try to create multiple series out of single series and then it will generalize for all 5K. In both the cases, we will get 5K output as prediction. In that case, how are these 2 ways are different in terms of algorithm as well as memory usage.

Also if you can help me with the second error which I mentioned above while using multiple GPUs.

dennisbader commented 11 months ago

Let's just look at a simplified example, and ignore the time dimension of the Darts model. Let's say an input batch to a model is a tensor with shape (batch size, number of features).

For the multi GPU question:

KoustavDS commented 11 months ago

I have tried passing data through list and it is working.Thank you @dennisbader.

dennisbader commented 11 months ago

No worries @KoustavDS, did it also work in the Multi-GPU scenario?

Would be helpful to reassure that the issue I mentioned above is not affecting all users.

KoustavDS commented 11 months ago

Hi @dennisbader ..Yes, I was able to solve the multi-GPU scenario as well. It has limitation, it will only work when we run the code from terminal with .py format. It does not run from Notebook.

Also I was trying to add holiday, weekday as covariates. I am able to add weekday but not holiday. Here is code and the error. If you know about this error, pls do let me know. Thanks.

CODE: cov_month = datetime_attribute_timeseries(new_series1, attribute="month") cov_day = datetime_attribute_timeseries(new_series1, attribute="day") cov_holiday = holidays_timeseries(new_series1,country_code = 'US')

ERROR: **_573 time_index = _extend_time_index_until(time_index, until, add_length) --> 574 scope = range(time_index[0].year, (time_index[-1] + pd.Timedelta(days=1)).year) 575 country_holidays = holidays.country_holidays( 576 country_code, prov=prov, state=state, years=scope 577 ) 578 index_series = pd.Series(time_index, index=time_index)

AttributeError: 'TimeSeries' object has no attribute 'year'_**

dennisbader commented 11 months ago

Good to hear that it multi-GPU worked @KoustavDS.

Can you open a new issue for the holidays, so I can close this one?

Also, try to make give a minimal reproducible example for new_series1, thanks!

KoustavDS commented 11 months ago

Sure...I created a new issue. https://github.com/unit8co/darts/issues/2022#issue-1931192569