Closed RoelVerbelen closed 3 weeks ago
One way for you to investigate this is to make sure that your model object still works with all the functions from the insight
package that marginaleffects
call. For example, insight::get_data()
, insight::find_variables()
, etc.
Thanks Vincent. The only way I see for insight::get_data()
to keep working as we need it to is for the data frame to still exist in the global environment with the same name as it was referred to using model fitting. That's not really robust to rely on that in between different R sessions - one where model is fitted and one where it is evaluated (see above where I explicitly did rm(model_data )
to mimic this).
However, marginaleffects
doesn't strictly need that data in order to do the post-estimation predictions. It only uses it for validating the argument inputs (here: whether all the factor values are observed in the model data). Ideally, I'd like to find a way to bypass that validation check so I can keep the model objects small in size. Perhaps by introducing a marginaleffects function argument / package option?
@RoelVerbelen
I see why this would be useful.
I would be open to merging a PR which looks at ...
for something named modeldata
that pre-empts the need to call insight::get_data()
. But to be frank, this is very low priority for me and I will not work on it unless the PR is very close to all finished, including coverage for all functions and some tests.
Also, I would only merge something like this if it requires very few lines of code; not willing to admit lots of code complexity. There's a chance this is much more complicated than we think, since insight::find_variables()
may also rely on insight::get_data()
. In that case, a simple PR may not do it.
Sorry to be so blunt. I see why this might be useful, but want to be fully transparent about parameters for a potential contribution.
Closing to keep the issue tracker manageable, and because I don't intend to work on this in the short run. But, as stated, I am quite open to the idea, given an adequate implementation.
I listed this in the master thread for good ideas with no immediate fixes.
Thanks for raising!
Thanks for considering it further down the road, @vincentarelbundock. In the short term, I'll be relying on loading the modelling data using the same name as when fitting the model object such that insight::get_data()
gets it from the environment.
thanks for the note. this sounds like a useful hack.
I'm looking for some guidance on how to reduce a regression model's object size without impairing
marginaleffects
. Regression models in R tend to store the entire modelling data set as a list element in the resulting output. When working with large data sets that makes saving and loading (many iterations of) these models inefficient.I've created a minimal reprex to illustrate how trying to limit the object size of regression models can lead to errors within
marginaleffects
, even though the modelling data set is not strictly necessary for the post-fit estimation analysis.