Open MalteFlender opened 5 years ago
Thank you, Do you mean that you get some kind of error when running on your data? Or is it just extremelly slow? There is a known problem when running the code in parallel with large amounts of data, specifically when calculating the forecasts. You can try smaller batches for that part, then the part that relies on xgboost should be able to handle relatively larger datasets.
The parallelization problem of the forecasting part will be fixed soon
Currently I'm not using the system (I'm planing to). Since the training is done in one single step it seems to me like there is no way I can split the training into different parts and therefore process training-sets that are bigger than my current RAM-size, since that's the place where I have to store the data.
It seems to me that at the moment the RAM of the computer I'm using is the limiting factor regarding the amount of timeseries to be trained with. If I want train the system with e.g. 16 GB of timeseries-data I need to have at least 16 GB of RAM.
Is there a way to get around this issue? Maybe it is possible to train the system in smaller Batches or use some kind of iterator. I'm trying to train the system with a lot of data obtained from a database, where I get the timeseries in small chunks.