robjhyndman / M4metalearning

116 stars 49 forks source link

Memory limit for massive amount of timeseries #11

Open MalteFlender opened 5 years ago

MalteFlender commented 5 years ago

It seems to me that at the moment the RAM of the computer I'm using is the limiting factor regarding the amount of timeseries to be trained with. If I want train the system with e.g. 16 GB of timeseries-data I need to have at least 16 GB of RAM.

Is there a way to get around this issue? Maybe it is possible to train the system in smaller Batches or use some kind of iterator. I'm trying to train the system with a lot of data obtained from a database, where I get the timeseries in small chunks.

pmontman commented 5 years ago

Thank you, Do you mean that you get some kind of error when running on your data? Or is it just extremelly slow? There is a known problem when running the code in parallel with large amounts of data, specifically when calculating the forecasts. You can try smaller batches for that part, then the part that relies on xgboost should be able to handle relatively larger datasets.

The parallelization problem of the forecasting part will be fixed soon

MalteFlender commented 5 years ago

Currently I'm not using the system (I'm planing to). Since the training is done in one single step it seems to me like there is no way I can split the training into different parts and therefore process training-sets that are bigger than my current RAM-size, since that's the place where I have to store the data.