printing many models - Githubissues

matthieugomez commented 9 years ago

A big issue with current solutions (stargazer) is that one currently needs to store the output of N model to print the coefficients of N models, which is problematic for large datasets (example here and there).

It would be great if dust could separate saving / printing statistics to avoid this issue.

nutterb commented 9 years ago

I'm not completely sure I understand what you are requesting. I think (and please correct me if I'm wrong) you're suggesting that you have the option of having dust operate on the summary of a model (ie, the output from tidy(fit)) rather than being required to operate on the model itself. In other words

fit <- lm(....really big model....)
tidy_fit <- tidy(fit)
dust(tidy_fit) + ....

as opposed to

fit <- lm(..... really big model ....)
dust(fit) + ....

matthieugomez commented 9 years ago

Exactly.

To use only the output from tidy and glance, some chances would need to be done in broom (number of observation + name of dependent variable need to appear at least, as mentioned in this issue : https://github.com/dgrtwo/broom/issues/12)

Another solution would be to be able to combine multiple dust output into a same table (ie for instance one would do 50x times: run a model, run dust, discard the original lm object. Then, combine 5 output from dusts into a table. The combination step would be similar to a outer joint).

Of course, if you don't intend pixiedust to print multiple models in the same table, this is not really needed.

nutterb commented 9 years ago

You're sort of in luck. dust is really meant to work on a data frame, and it just uses broom to get things into the data frame. If you pass dust a data frame, it just won't tidy it*. So you could do any amount of preprocessing to a data frame prior to calling dust if pixiedust doesn't have the exact thing you need.

Eventually, the only thing that would be missing from this approach would be the glance statistics, but I can easily write a function to insert those into the table footer (and had in fact planned on doing this after I sorted out the multicell support). You'll just have to feed it the glance output instead of letting dust do it for you.

As for multiple model summaries in one table, I think that's actually a better task for broom. Have broom create a list of model summaries, rbind, and pass to dust. The weak spot there is again getting the glance statistics, if you want them. But I think the hardest part of solving that problem is getting a friendly interface.

So this may be fairly close to what you want. As I get the tables and headers worked out, I'll work on incorporating some of the other features you've described.

unless you want it tidied. Compare the difference between dust(mtcars) and dust(mtcars, tidy_df = TRUE).

nutterb commented 8 years ago

Starting in 0.6.3, I think pixiedust handles these cases well. It can work on a list of objects or a grouped data frame, and Caan generate glance statistics for each model in a list separately. Please reopen time issue if a critical feature of reporting multiple models is not accommodated.

nutterb / pixiedust

printing many models #13