robjhyndman / M4metalearning

116 stars 49 forks source link

hyperparameter_search can't compute more than 145000 timeseries #15

Closed Someone894 closed 5 years ago

Someone894 commented 5 years ago

In the last few weeks I tried to use the FFORMA-System for forecasting a massive block of data (about 110000 timeseries, sometimes just zeros). In doing so I found a bunch of problems. Most of them could be solved, but now I'm stuck.

Here is a short list of the fixed problems:

  1. The parallelisation of THA_features and calc_forecasts did not work well with Linux on IBM-Power.
    1. Fix it by using foreach and doParallel.
  2. THA_features produces NaN if the timeseries is completely filled with zeros.
    1. Fixed it by returning zeros instead of NaN from stl_features in the tsfeatures package if no season could be computed.
  3. train_interval_weights did not work for timeseries with always h=24.
    1. Fixing it by adding check for NULL values.

Hopefully I can upload the changed code soon. The main reason for this issue is the hyperparameter_search function. I am trying to train the FFORMA-system with the 100000 original M4 timeseries and my 110000 own timeseries combined. But apparently the hyperparameter search only works with < 145000 timeseries. If I try to use more timeseries I get the following error:

cannot open compressed file '<PATH>/M4_Hyper.RData', probable reason 'No such file or directory'
Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection
elapsed = 488.97        Round = 1       max_depth = 10.0000     eta = 0.4000    subsample = 0.9000      colsample_bytree = 0.6000       nrounds = 200.0000      Value = -0.9861
elapsed = 605.43        Round = 2       max_depth = 12.0000     eta = 0.7395    subsample = 0.8938      colsample_bytree = 0.5587       nrounds = 228.0000      Value = -0.9827
elapsed = 110.10        Round = 3       max_depth = 7.0000      eta = 0.2818    subsample = 0.5933      colsample_bytree = 0.9659       nrounds = 59.0000       Value = -0.9876
elapsed = 248.60        Round = 4       max_depth = 8.0000      eta = 0.0042    subsample = 0.8924      colsample_bytree = 0.9966       nrounds = 126.0000      Value = -1.0407
elapsed = 62.96         Round = 5       max_depth = 13.0000     eta = 0.7589    subsample = 0.7186      colsample_bytree = 0.8478       nrounds = 68.0000       Value = NaN
elapsed = 84.47         Round = 6       max_depth = 7.0000      eta = 0.8266    subsample = 0.6338      colsample_bytree = 0.8825       nrounds = 110.0000      Value = NaN
Error in GP_deviance(beta = row, X = X, Y = Y, nug_thres = nug_thres,  :
  Infinite values of the Deviance Function,
            unable to find optimum parameters
Calls: source ... eval -> eval -> <Anonymous> -> apply -> FUN -> GP_deviance

Do you have any idea or direction as to how I can fix this issue?

Someone894 commented 5 years ago

I found a suitable solution: You need to set the upper bound for the learning rate to something below .66 eta = c(0.001, 0.659).