Closed matthieugomez closed 8 years ago
I'm not completely sure I understand what you are requesting. I think (and please correct me if I'm wrong) you're suggesting that you have the option of having dust
operate on the summary of a model (ie, the output from tidy(fit)
) rather than being required to operate on the model itself. In other words
fit <- lm(....really big model....)
tidy_fit <- tidy(fit)
dust(tidy_fit) + ....
as opposed to
fit <- lm(..... really big model ....)
dust(fit) + ....
Exactly.
To use only the output from tidy
and glance
, some chances would need to be done in broom
(number of observation + name of dependent variable need to appear at least, as mentioned in this issue : https://github.com/dgrtwo/broom/issues/12)
Another solution would be to be able to combine multiple dust
output into a same table (ie for instance one would do 50x times: run a model, run dust
, discard the original lm
object. Then, combine 5 output from dusts into a table. The combination step would be similar to a outer joint).
Of course, if you don't intend pixiedust
to print multiple models in the same table, this is not really needed.
You're sort of in luck. dust
is really meant to work on a data frame, and it just uses broom
to get things into the data frame. If you pass dust
a data frame, it just won't tidy it*. So you could do any amount of preprocessing to a data frame prior to calling dust
if pixiedust
doesn't have the exact thing you need.
Eventually, the only thing that would be missing from this approach would be the glance
statistics, but I can easily write a function to insert those into the table footer (and had in fact planned on doing this after I sorted out the multicell support). You'll just have to feed it the glance
output instead of letting dust
do it for you.
As for multiple model summaries in one table, I think that's actually a better task for broom
. Have broom
create a list of model summaries, rbind
, and pass to dust
. The weak spot there is again getting the glance
statistics, if you want them. But I think the hardest part of solving that problem is getting a friendly interface.
So this may be fairly close to what you want. As I get the tables and headers worked out, I'll work on incorporating some of the other features you've described.
dust(mtcars)
and dust(mtcars, tidy_df = TRUE)
.Starting in 0.6.3, I think pixiedust
handles these cases well. It can work on a list of objects or a grouped data frame, and Caan generate glance
statistics for each model in a list separately. Please reopen time issue if a critical feature of reporting multiple models is not accommodated.
A big issue with current solutions (stargazer) is that one currently needs to store the output of N model to print the coefficients of N models, which is problematic for large datasets (example here and there).
It would be great if
dust
could separate saving / printing statistics to avoid this issue.