Closed GuiMarthe closed 6 years ago
Sorry to respond to this so late.
There are a lot of outstanding requests for mixed model development. There's also a lot of ambiguity about exactly what should be done: see this list, especially #38 and #96.
I've been doing a lot of work on my fork (bbolker/broom). If you're interested I can add you as a collaborator.
Hello @bbolker,
I'm sorry I have been very slow to respond to issues regarding mixed model tidying, and thank you for your excellent work on the topic.
There's a proposal I've been considering for a few weeks. broom has become bloated in terms of the number of packages it tidies (e.g. it has 49 in SUGGESTS in DESCRIPTION), making it very challenging for me to maintain, especially for ones like mixed models about which I am mostly unfamiliar from a statistical perspective.
I'd like to split off development of mixed-model-related functions- at least lme4_tidiers and nlme_tidiers, as well as all the work you've done in your fork- into a separate package, on which you were the official maintainer. This is similar to what tidytext does for tidying methods related to text mining. The existing tidying methods in broom would then offer a deprecation warning (eventually an error) pointing towards your package.
This would be greatly appreciated since it would keep me from needing to serve as an (underqualified) gatekeeper, and let you and other experts develop the best methods for tidying these types of models. As a name for this package, in deference to your previous conventions, I would humbly suggest bbbroom.
If this makes sense to you in theory, please let me know so we can talk next steps (I would open a new issue for it).
Splitting off certainly makes sense.
I don't want to call it bbbroom (I wouldn't have named bbmle that way if I'd known it would still be being used 10 years later). Maybe mixedbroom ? or broommixed? or broommm? (If broom sub-packages are called broomXXX they'll be sorted in a sensible way in package lists ...) (broomMM would be good but I agree with Hadley's advice about keeping package names all-lowercase.)
Naming, and a few other issues I have thoughts on, can/should be better discussed in a new issue you open [for my own reminder: (1) consistency/interaction with 'base broom' methods like process_lm; (2) decision-making about conventions/options for extracting different components; (3) overlap/linkage between MCMC tidiers and specifically GLMM-related MCMC tidiers ...]
@bbolker Excellent, I'll follow up with a new issue and look forward to starting the process.
Final naming decision will be 100% yours, but for inspiration I've opened it up to Twitter.
bump. Is there a new issue? I'd like to settle on a name. Design questions can probably be discussed as broad issues on the repo for the new package.
I have a preference for broom.mixed
as a package name. In general, I think keeping with a naming convention of broom.*
for any break off packages makes sense.
Addendum (otherwise known as I've been thinking about this too much the past few weeks)
I rather like the idea of breaking broom
up into several packages. My recommendation for it would be to have, at least, the following:
broom
(tidiers for anything that is included in the base
Priority packages (see below)broom.mixed
broom.mgcv
(since @gavinsimpson has offered to do it)broom.survival
(this seems to be another broad interest area. I'm half a breath a way from volunteering, but I can barely keep up with the packages I already have)broom.misc
(everything else)The key advantages would be that broom
could be a stable package with as few dependencies as possible that primarily exists to provide the method definition. With fewer dependencies, it might be easier to get other package authors to pull some of the content out of broom.misc
and into the packages that create the objects being tidied.
The obvious disadvantage is that this would murder backward compatibility.
An alternative might be to make a broom.base
(which accomplishes the first bullet) and leave everything else in broom
, with a Depends
on broom.base
. The disadvantage here is such a release would have to be coordinated with the break-off packages, or released before they are.
## Package for the primary `broom` package.
library(dplyr)
installed.packages() %>%
as.data.frame(stringsAsFactors = FALSE) %>%
filter(Priority %in% "base") %>%
select(Package, Priority)
Some thoughts:
broommm
: hard to remember how many m
s there are in the titlemixedbroom
: I think broomXXX
will be better for extension packages (especially as it would be good to keep these packages sorted together in an alphabetical list`broommixed
: not bad (again with the number-of-m
s problembroomle
(HW's suggestion) doesn't make sense to me - these aren't necessarily maximum likelihood estimation packages (maybe Bayesian/MCMC), which is what mle
means to mebroomer
isn't too bad (but doesn't extend to other extensions)broomix
not too bad either So after all that ... broom.mixed
might be pretty good. @gavinsimpson: broom.gam
or broom.mgcv
?
Please excuse my absence, work and college is not fun :smile: .
I think broom.mixed
is a great name. However I think the naming processes of "broom tidiers spinnoffs" should be a principled decision so that it can be generalization for newer packages.
Why not settle for broom.<packagename>
?
I mean, the name broom.lmer4
really entails the idea of broom functions for the lme4
package.
I agree that broom<whatever>
packages should be named in a principled/generalized way.
At least in my opinion/according to my understanding, the new package is supposed to have a broader scope than lme4
. There are currently tidiers for lme4
, nlme
, glmmADMB
, glmmTMB
, MCMCglmm
, and brms
, and there could be others - the point is that all mixed-model packages will have a similar set of issues and challenges to deal with.
I could cc: the discussion to maintainers of these packages as well (@paul-buerkner is the only one who I know is active on this repo).
Seems like "principled/generalized" gets squishy pretty quick.
Attempting to create a broom.package
for all of the packages that have tidiers would result in at least 60 new packages to add to CRAN. Maintaining that many packages could be a full time job for at least three people; I don't think it is feasible in the realm of "volunteer effort."
Another point to consider: if we were going to make a broom.package
that only tidied package
objects, I would find it preferable to first ask the maintainer if he or she were willing to adopt the tidiers directly into package
. Putting the tidiers into a separate package probably ought to be Plan B (and I think @dgrtwo has made a similar statement in the past...yup, here). Even then, it would be preferable to put it into a package with other tidiers where it can be actively maintained.
However, in cases like mixed models, there are several model types that--as @bbolker has observed-- have similar challenges and will likely benefit from a shared code base.
Then, at least by the looks of it, we have a pretty hard case in favor of broom.mixed
.
So far, the list of packages that broom.mixed
should support is:
lme4
nlme
glmmADMB
glmmTMB
MCMCglmm
brms
Am I missing anything?
I think the rstanarm tidier is missing (?)
Am 19.09.2017 17:57 schrieb "Guilherme Marthe" notifications@github.com:
Then, at least by the looks of it, we have a pretty hard case in favor of broom.mixed. So far, the list of packages that broom.mixed should support is:
Am I missing anything?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tidyverse/broom/issues/223#issuecomment-330586119, or mute the thread https://github.com/notifications/unsubscribe-auth/AMVtABM5aO6NUSk0VZapO9WBKjDEVqdEks5sj-RTgaJpZM4NYoiC .
possibly: R-INLA (although it might be a nightmare); MASS::glmmPQL (might need to be handled separately); ordinal; blme (may already be handled by lme4 tidiers?); gamm4 (in mixed.gam
?); spamm; coxme ...
But I don't think we necessarily need to decide now. We can collect the most common ones and then have an open issue for requests for package coverage.
Excuse me for being late to this party. As a consumer of broom, I really hope that decentralization doesn't weaken the key value, which is to get information about regressions and tests in a predictable form. In particular, it is good to have predictable column names. So, I would prefer decentralization with a relatively strong base package, which provides constraints on what tidy
and friends will return.
In a decentralized model, it is really hard to enforce this, but we already have https://github.com/tidyverse/broom#conventions as a resource for how to name columns.
I believe the particular issue that started this thread is resolved now: see https://github.com/bbolker/broom.mixed/issues/1 ... also, encourage further discussion relating to broom.mixed
to move to https://github.com/bbolker/broom.mixed/issues/ ...
Closing in favor of the issues linked above if the issue hasn't already been resolved.
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
Hey folks, I just seen that using broom with the lme4 objects does not offer the possibility to inspect the estimates for random effects. Is this correct. or am I missing something?
If possible and necessary, I'd like to help develop these functions. What would I need to do? Is there some sort of developers guide for the broom package?