NULL model blows up memory usage with parallelised fits

hongooi73 commented 4 years ago

model() sometimes blows up and eats all the memory on my machine (32GB, Win 10 Pro). I'll try to make a reprex. It seems kind of random though.

mitchelloharawild commented 4 years ago

A reprex for this would be great.

hongooi73 commented 4 years ago

Here's a reprex:

library(tidyr)
library(dplyr)
library(tsibble)
library(feasts)
library(fable)

data(orangeJuice, package="bayesm")

start_date <- as.Date("1970-01-01")

oj_data <- orangeJuice$yx %>%
    complete(store, brand, week) %>%
    mutate(week=yearweek(start_date + week*7)) %>%
    as_tsibble(index=week, key=c(store, brand))

subset_oj_data <- function(start, end)
{
    start <- yearweek(start_date + start*7)
    end <- yearweek(start_date + end*7)
    filter(oj_data, week >= start, week <= end)
}

ncores <- max(2, parallel::detectCores(logical=FALSE) - 2)
cl <- parallel::makeCluster(ncores, type="PSOCK")
parallel::clusterEvalQ(cl,
{
    library(feasts)
    library(fable)
    library(tsibble)
})

res_par <- parallel::parLapply(cl, list(subset_oj_data(40, 135)), function(df)
{
    model(df,
        ets=ETS(logmove ~ error("A") + trend("A") + season("N"))
    )
})

On my laptop, this uses 10GB of memory before returning.

The problem only occurs with a cluster. If I replace the parLapply with a regular lapply, everything is fine. The returned object is ~12MB in size.
The problem occurs if there are NAs in the data (which ETS can't handle). If I insert a fill(everything()) in the first pipeline, everything is also fine.

As an aside, it takes ETS quite a bit of time to realise that there are NAs... could this be made more efficient?

mitchelloharawild commented 4 years ago

By memory usage, are you referring to the object size of the returned object? This is likely due to the environments being transferred/stored from the parallel calls, which is something that will be worked on. Parallel processing is supported natively (see model() docs) using future::plan(), it still needs work to transfer less information between nodes, but it will get better.

hongooi73 commented 4 years ago

No, I mean the memory usage as shown in Task Manager

hongooi73 commented 4 years ago

Btw, I see the same problem in an Ubuntu VM. The object returned from the parLapply call is only 12MB, but the R process is taking up 10GB of memory.

hongooi73 commented 4 years ago

Is there a way to generate a null model from scratch? Would probably save a lot of time in trying to reproduce this problem.

mitchelloharawild commented 4 years ago

fabletools::null_model().

I'm surprised that it is the null models which cause this issue.

hongooi73 commented 4 years ago

Ok, I just tried null_model() and that returns something completely different to what I get.

> str(null_model())
Classes 'mdl_defn', 'R6' <mdl_defn>
  Public:
    add_data: function (.data) 
    check: function (.data) 
    clone: function (deep = FALSE) 
    data: NULL
    env: environment
    extra: list
    formula: quosure, formula
    initialize: function (formula, ..., .env) 
    model: null_mdl
    prepare: function (...) 
    print: function (...) 
    recall_lag: function (x, n = 1L, ...) 
    recent_data: NULL
    remove_data: function () 
    specials: environment
    stage: NULL
    train: function (.data, ...)

Here is an example failed model fit from ETS:

> foo$ets[[1]]
Series: logmove 
Model: NULL model 
NULL model> 

> z <- foo$ets[[1]]

> z
Series: logmove 
Model: NULL model 
NULL model

> object.size(z)
10064 bytes

> unclass(z)
$fit
$n
[1] 95

$vars
[1] "logmove"

attr(,"class")
[1] "null_mdl"

$model
<null_mdl model definition>

$data
# A tsibble: 95 x 2 [1W]
       week logmove
 *   <week>   <dbl>
 1 1990 W25    9.02
 2 1990 W26   NA   
 3 1990 W27   NA   
 4 1990 W28   NA   
 5 1990 W29   NA   
 6 1990 W30   NA   
 7 1990 W31    8.72
 8 1990 W32    8.25
 9 1990 W33    8.99
10 1990 W34   NA   
# … with 85 more rows

$response
$response[[1]]
logmove

$transformation
$transformation[[1]]
Transformation: .x
Backtransformation: .x

It's probably not the null model object that is the problem, but all the other bits that get returned from model.

mitchelloharawild commented 4 years ago

null_model() gives a model definition, much like ETS().

library(fabletools)
tsibble::pedestrian %>% 
  model(null_model(Count))
#> # A mable: 4 x 2
#> # Key:     Sensor [4]
#>   Sensor                        `null_model(Count)`
#>   <chr>                                     <model>
#> 1 Birrarung Marr                       <NULL model>
#> 2 Bourke Street Mall (North)           <NULL model>
#> 3 QV Market-Elizabeth St (West)        <NULL model>
#> 4 Southern Cross Station               <NULL model>

^{Created on 2020-02-20 by the reprex package (v0.3.0)}

mitchelloharawild commented 4 years ago

My guess again is the environments held in the transformation.

hongooi73 commented 4 years ago

Yup, I just tried the following;

mod_list <- parallel::parLapply(cl, oj_train, function(df) model(df, null=null_model(logmove)))
mod_list

# A mable: 913 x 3
# Key:     store, brand [913]
   store brand null        
   <int> <int> <model>     
 1     2     1 <NULL model>
 2     2     2 <NULL model>
 3     2     3 <NULL model>
 4     2     4 <NULL model>
 5     2     5 <NULL model>
 6     2     6 <NULL model>
 7     2     7 <NULL model>
 8     2     8 <NULL model>
 9     2     9 <NULL model>
10     2    10 <NULL model>
# … with 903 more rows

str(mod_list[[1]]$null[[1]])

List of 1
 $ fit           :List of 2
  ..$ n   : int 95
  ..$ vars: chr "logmove"
  ..- attr(*, "class")= chr "null_mdl"
 - attr(*, "class")= chr "mdl_ts"

That's very truncated compared to the "real" null model above.

hongooi73 commented 4 years ago

No, cancel that, it appears to be the same structure. So there must be an environment that ETS is capturing that null_model doesn't.

unclass(mod_list[[1]]$null[[1]])

$fit
$n
[1] 95

$vars
[1] "logmove"

attr(,"class")
[1] "null_mdl"

$model
<null_mdl model definition>

$data
# A tsibble: 95 x 2 [1W]
       week logmove
 *   <week>   <dbl>
 1 1990 W25    9.02
 2 1990 W26   NA   
 3 1990 W27   NA   
 4 1990 W28   NA   
 5 1990 W29   NA   
 6 1990 W30   NA   
 7 1990 W31    8.72
 8 1990 W32    8.25
 9 1990 W33    8.99
10 1990 W34   NA   
# … with 85 more rows

$response
$response[[1]]
logmove

$transformation
$transformation[[1]]
Transformation: .x
Backtransformation: .x

hongooi73 commented 4 years ago

Here's an even simpler reprex.

library(fable)
library(feasts)
library(tsibble)

set.seed(12345)
df <- expand.grid(x=1:1000, t=1:100)
df$y <- runif(nrow(df))
miss <- rbinom(nrow(df), 1, 0.25)
df$y[as.logical(miss)] <- NA
df$t <- as.Date("1970-01-01") + df$t

df <- as_tsibble(df, key=x, index=t)

cl <- parallel::makeCluster(4)
parallel::clusterEvalQ(cl,
{
    library(fable)
    library(feasts)
    library(tsibble)
})

bad <- parallel::parLapply(cl, list(df, df, df, df), function(df)
{
    model(df, ets=ETS(y ~ error("A") + trend("A") + season("N")))
})

This chews up memory on both my Windows laptop and an Ubuntu VM in Azure. Interestingly, if I replace the ETS model with a null_model, then there is no problem.

good <- parallel::parLapply(cl, list(df, df, df, df), function(df)
{
    model(df, null=null_model(y))
})

mitchelloharawild commented 4 years ago

Looks like this is also an issue without parallel (to a lesser extent).


> bench::mark(
+   model(df, null=ETS(y ~ error("A") + trend("A") + season("N"))),
+   model(df, null=null_model(y)),
+   check = FALSE
+ )
|====================================================================================================================================================== |100% ~0 s remaining     # A tibble: 2 x 13
  expression                                                            min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result memory         time   gc         
  <bch:expr>                                                       <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list> <list>         <list> <list>     
1 model(df, null = ETS(y ~ error("A") + trend("A") + season("N")))   17.61s   17.61s    0.0568      51MB     3.86     1    68     17.61s <NULL> <df[,3] [149,… <bch:… <tibble [1…
2 model(df, null = null_model(y))                                     1.83s    1.83s    0.547     14.5MB     3.83     1     7      1.83s <NULL> <df[,3] [26,1… <bch:… <tibble [1…

mitchelloharawild commented 4 years ago

Added some performance improvements to the model parser. Still more that can be done here, I expect that the model parser can be made 2x faster without too much trouble.


> bench::mark(
+   model(df, null=ETS(y ~ error("A") + trend("A") + season("N"))),
+   model(df, null=null_model(y)),
+   check = FALSE
+ )
|====================================================================================================================================================== |100% ~0 s remaining     # A tibble: 2 x 13
  expression                                                            min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result memory         time   gc         
  <bch:expr>                                                       <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list> <list>         <list> <list>     
1 model(df, null = ETS(y ~ error("A") + trend("A") + season("N")))   10.63s   10.63s    0.0940    40.1MB     4.14     1    44     10.63s <NULL> <df[,3] [100,… <bch:… <tibble [1…
2 model(df, null = null_model(y))                                     1.51s    1.51s    0.661     14.5MB     5.29     1     8      1.51s <NULL> <df[,3] [34,2… <bch:… <tibble [1…

hongooi73 commented 4 years ago

Well, I would (maybe) expect ETS to take more time and resources, since it's actually trying to fit a model. null_model can return immediately since it doesn't have to do anything. This seems to be separate from the parallel issue, where returning a buggy object back to the master triggers a massive memory allocation. Note that if I replace the parLapply with a regular lapply, then there's no problem.

ETS could probably check for NAs right at the start, and return immediately if found. This should reduce the resource requirements to the minimum.

I'm trying to find a similar reprex for ARIMA, but it seems to be behaving well so far.

mitchelloharawild commented 4 years ago

Closing as this issue is largely driven by https://github.com/tidyverts/fabletools/issues/146

Parallel processing performance will be optimised in the next two months.

tidyverts / fable

NULL model blows up memory usage with parallelised fits #230