robjhyndman / forecast

Forecasting Functions for Time Series and Linear Models
http://pkg.robjhyndman.com/forecast
1.11k stars 341 forks source link

parallel nnetar fitting #346

Open dashaub opened 8 years ago

dashaub commented 8 years ago

What do you think about adding parallelization to nnetar? The code in avnnet looks very easy to parallelize. For long timeseries with large repeat and large number of CPU cores I imagine it would create a good speedup.

robjhyndman commented 8 years ago

Yes, good idea.

dashaub commented 8 years ago

I've put together a PR #349 implementing this. After messing around with parallel::parLapply() for a while and getting argument name collision errors, I settled with foreach::foreach(). The performance improvements look good on the long taylor series but bad for short series, so I set the default parallel = FALSE. An adaptive rule based on the series length like used in tbats() could make sense here.

Pros: great performance on long series Cons: imports foreach::foreach() and doParallel::registerDoParallel(). These are common packages and probably installed on most systems already, however.

# current implementation
library(microbenchmark)
library(devtools)
install_github("robjhyndman/forecast")
library(forecast)
microbenchmark(nnetar(taylor),
               nnetar(AirPassengers),
               nnetar(AirPassengers, repeats = 500), times = 5)
Unit: seconds
                                 expr        min         lq       mean
                       nnetar(taylor) 837.956053 841.290937 841.430421
                nnetar(AirPassengers)   1.009523   1.030911   1.027371
 nnetar(AirPassengers, repeats = 500)  24.779071  24.818174  24.841770
     median         uq        max neval
 842.189486 842.570680 843.144946     5
   1.031746   1.032079   1.032597     5
  24.845551  24.863318  24.902737     5

# parallel implementation with parallel on by default
install_github("dashaub/forecast")
library(forecast)
microbenchmark(nnetar(taylor, parallel = TRUE),
               nnetar(AirPassengers, parallel = FALSE),
               nnetar(AirPassengers, parallel = TRUE, num.cores = 1),
               nnetar(AirPassengers, parallel = TRUE, num.cores = 2),
               nnetar(AirPassengers, parallel = TRUE, num.cores = 4),
               nnetar(AirPassengers, repeats = 500, parallel = TRUE, num.cores = 4),
               times = 5)
Unit: seconds
                                                                 expr
                                      nnetar(taylor, parallel = TRUE)
                              nnetar(AirPassengers, parallel = FALSE)
                nnetar(AirPassengers, parallel = TRUE, num.cores = 1)
                nnetar(AirPassengers, parallel = TRUE, num.cores = 2)
                nnetar(AirPassengers, parallel = TRUE, num.cores = 4)
 nnetar(AirPassengers, repeats = 500, parallel = TRUE, num.cores = 4)
        min         lq       mean     median         uq        max neval
 424.717306 426.561738 427.136085 427.486460 428.064305 428.850618     5
   1.018690   1.018691   1.022518   1.022630   1.025139   1.027438     5
   6.375443   6.402015   6.401181   6.404755   6.405817   6.417876     5
   7.482467   7.518907   7.545821   7.519844   7.529500   7.678389     5
  10.714595  10.798919  10.816837  10.811451  10.834412  10.924808     5
  18.562486  18.579964  18.624954  18.595774  18.641525  18.745022     5

# some more tests on the taylor series
microbenchmark(nnetar(taylor, parallel = FALSE),
               nnetar(taylor, parallel = TRUE, num.cores = 1),
               nnetar(taylor, parallel = TRUE, num.cores = 2),
               nnetar(taylor, parallel = TRUE, num.cores = 4),
               nnetar(taylor, repeats = 100, parallel = TRUE, num.cores = 1),
               nnetar(taylor, repeats = 100, parallel = TRUE, num.cores = 2),
               nnetar(taylor, repeats = 100, parallel = TRUE, num.cores = 4),
               times = 5)

Unit: seconds
                                                          expr       min
                              nnetar(taylor, parallel = FALSE)  839.0974
                nnetar(taylor, parallel = TRUE, num.cores = 1)  839.0866
                nnetar(taylor, parallel = TRUE, num.cores = 2)  426.8741
                nnetar(taylor, parallel = TRUE, num.cores = 4)  222.6650
 nnetar(taylor, repeats = 100, parallel = TRUE, num.cores = 1) 4179.6068
 nnetar(taylor, repeats = 100, parallel = TRUE, num.cores = 2) 2106.8621
 nnetar(taylor, repeats = 100, parallel = TRUE, num.cores = 4) 1067.8230
        lq      mean    median        uq       max neval
  843.1898  844.2343  844.1039  846.0222  848.7580     5
  839.7962  842.2116  842.8739  843.8619  845.4396     5
  426.9788  427.9270  427.8327  428.3260  429.6235     5
  222.7813  223.3000  223.0341  223.9712  224.0484     5
 4182.5776 4184.9711 4184.7301 4186.4020 4191.5392     5
 2109.5174 2109.8424 2110.0868 2111.0205 2111.7252     5
 1070.3660 1072.8482 1070.8182 1073.7582 1081.4759     5
robjhyndman commented 8 years ago

I would like to avoid adding additional package dependencies. What were the issues with using parallel?

dashaub commented 8 years ago

nnet() uses the x and y arguments for the data, and it seems that parLapply() is also passing a differentx argument in the ... arguments down the function call somewhere that conflicts with this. There might be a way around this by setting up a wrapper function for avnnet()

robjhyndman commented 8 years ago

You could try using do.call