N-Beats and xgboost models are taking more time when executing in parallel

unit8co / darts

A python library for user-friendly forecasting and anomaly detection on time series.

https://unit8co.github.io/darts/

Apache License 2.0

7.87k stars 851 forks source link

N-Beats and xgboost models are taking more time when executing in parallel #2279

Closed praveenjana closed 5 months ago

praveenjana commented 5 months ago

Describe the bug N-Beats and xgboost models are taking more time when executing in parallel. If I am using single process, each algorithm is executing within one sec. Where as If I increase process count to 2(or >1) each algorithm is taking 15sec to complete execution for the same data.

To Reproduce Models are initiated like below.

 xgb_model = XGBModel(lags=12,output_chunk_length=3)
 nbeats_model = NBEATSModel(
                           input_chunk_length=30,
                           output_chunk_length=7,
                           generic_architecture=True,
                           num_stacks=10,
                           num_blocks=1,
                           num_layers=4,
                           layer_widths=512, 
                           n_epochs=5,
                           nr_epochs_val_period=1,
                           batch_size=800,
                           model_name="nbeats_run",
                       )

Expected behavior N-Beats and Xgboost should take same amount of time while executing in paralell System (please complete the following information):

Python version: 3.11
darts version 0.27.2

Additional context Is there some parameter to tune these models? so that execution time will reduce

madtoinou commented 5 months ago

Hi @praveenjana,

Can you please the entire code snippet so that we can reproduce the issue on our side? Including how you make these two models run in parallel. Thank you!

praveenjana commented 5 months ago

Hi @madtoinou , Thanks for the prompt response.

The below file has the Python code. I have renamed as txt file as git is not allowing me to upload .py file. reproduction_code_python.txt Below are the log files. threads_1.txt threads_2.txt

Steps to follow:

I am running this code in CentOS Linux 7 (Core) with Python 3.11
Rename the file reproduction_code_python.txt to reproduction_code_python.py
Execute the below command from terminal for a single process (no parallel Execution) python reproduction_code_python.py 1
Execute the below command from terminal to use two processes (parallel execution) python reproduction_code_python.py 2

Please let me know any further details are needed from my end.

Thanks, Praveen.

madtoinou commented 5 months ago

@praveenjana,

My guess is that the over-head associated with multi-threading makes it not worth it for your example and I don't think there is much that we can do at the Darts level to remediate this.

On another note, the deep learning models (including NBEATSModel) training can be accelerated by allocating workers to the dataloader (here). Similarly, a lot of the regression models expose the 'n_jobs parameter to optimize the usage of the threads. I would recommend exploring these directions.

praveenjana commented 5 months ago

thanks @madtoinou .