tidyverts / fabletools

General fable features useful for extension packages
http://fabletools.tidyverts.org/
89 stars 31 forks source link

Memory issue from mable and possibly model definition #369

Open FinYang opened 1 year ago

FinYang commented 1 year ago

This is more of a question than an issue if this behaviour is expected:

library(fable)
#> Loading required package: fabletools
library(tsibble)
#> 
#> Attaching package: 'tsibble'
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, union
mdl <- model(tourism, SNAIVE(Trips))
lobstr::obj_size(tourism)
#> 1.11 MB
lobstr::obj_size(mdl)
#> 3.65 MB

Created on 2022-10-14 with reprex v2.0.2 The size of mable is triple of the data itself, even if I'm only using SNAIVE. Isn't this a little big? The model definitions themselves are quite big, so maybe this is normal:

lobstr::obj_size(fable::SNAIVE())
#> 1.15 MB
lobstr::obj_size(fable::ARIMA())
#> 1.57 MB
lobstr::obj_size(fable::ETS())
#> 1.25 MB

Created on 2022-10-14 with reprex v2.0.2

But the real issue is the size scales quite seriously to the data

> mdl <- model(df, SNAIVE(value))
> lobstr::obj_size(df)
68.25 MB
> lobstr::obj_size(mdl)
147.51 MB

It looks the model definition is stored separately for every key group?

> obj_size(mdl$`SNAIVE(value)`[[1]])
1.82 MB
> obj_size(mdl$`SNAIVE(value)`[[1]]$model)
1.81 MB
> mdl$`SNAIVE(value)`[[1]]$model
<RW model definition>

With my much larger dataset, my RAM quickly explodes. I tried to find out why but couldn't dig deeper with model definition being R6 with so many environments linking each other so still didn't find out what is it that the model is storing that causes the size to be big. All I can tell is the model definition seems like a bunch of harmless functions. Is there any way to avoid this?

mitchelloharawild commented 1 year ago

The objects might appear large in lobstr since they contain references to functions in package environments (for example the model training function). Memory efficiency is however a common issue and something that needs to be investigated closely.

mitchelloharawild commented 1 year ago

Could you check some sizes of the model internals for your df and mdl example? I'm curious about the size of the model and transformation relative to the fit. fit should contain things specific to each model, while model and transformation contain environments which may appear large.