tidymodels / broom

Convert statistical analysis objects from R into tidy format
https://broom.tidymodels.org
Other
1.45k stars 303 forks source link

Documentation approach #419

Closed alexpghayes closed 4 years ago

alexpghayes commented 6 years ago

Many of the tidiers return the same type of information about different model types. For example, a bajillion glance methods return a tibble with an AIC column.

Ideally, I'd like to document what the AIC column represents in a single place, so that all those glance methods benefit from the same documentation and there's no need to copy-paste any changes to the description.

Currently, most methods document the return value with something like:

#' @rdname betareg_tidiers
#'
#' @return `glance` returns a one-row tibble with columns:
#'   \item{pseudo.r.squared}{the deviance of the null model}
#'   \item{logLik}{the data's log-likelihood under the model}
#'   \item{AIC}{the Akaike Information Criterion}
#'   \item{BIC}{the Bayesian Information Criterion}
#'   \item{df.residual}{residual degrees of freedom}
#'   \item{df.null}{degrees of freedom under the null}
#'
#' @seealso [glance()]
#' @export
glance.betareg <- function(x, ...) {
  ...
}

I'd like to replace this with something along the lines of

#' @rdname betareg_tidiers
#'
#' @template glance_boilerplate
#' @return A one-row tibble with columns:
#'   -`pseudo.r.squared`
#'   -`logLik`
#'   -`AIC`
#'   -`BIC`
#'   -`df.residual`
#'   -`df.null`
#'
#' @seealso [output_column_glossary()], [glance()]
#' @export
glance.betareg <- function(x, ...) {
  ...
}

where somewhere in the boilerplate describes how to use filter(column_glossary, used_by == "betareg") to get see the descriptions of the return columns.

@dgrtwo thoughts?

alexpghayes commented 6 years ago

Update for anyone following along: the new plan is to document output columns names in glance.yaml, tidy.yaml, etc. These YAML files will then get translated into a tibble column_glossary that we will export as a data object. We will then use roxygen templates to pull the relevant columns from the column_glossary object and populate the documentation. I've started work on this in the column_glossary branch on my fork.

alexpghayes commented 6 years ago

Hadley has suggested using the evalRd tag instead. Will look into that soon.

alexpghayes commented 6 years ago

I'm now convinced that @evalRd is the wrong approach. Consider

#' @evalRd return_tidy("component", regression = TRUE)
#'
#' @details The tibble has one row for each term in the regression. The
#'   `component` column indicates whether a particular
#'   term was used to model either the `"mean"` or `"precision"`. Here the
#'   precision is the inverse of the variance, often referred to as `phi`.
#'   At least one term will have been used to model the precision `phi`.
tidy.ivreg <- ...

This (on my local branch) produces documentation for the standard regression terms, and also for the "component" column. The issue is that tidy.kmeans also uses the component column, and describes it as: Cluster id as a factor. For a model ‘k' clusters, these will be 'as.factor(1:k)', or 'as.factor(0:k)' if there’s a noise term.

The results of the @evalRd call don't show up at all if the documentation has any @return tag, so the options here are:

  1. Deal with inconsistent column names immediately upon transitioning to @evalRd. This is an unrealistically large task at the moment.
  2. Document the component column in @details or elsewhere. I'm opposed to this.

So I don't think @evalRd is flexible enough to standardize the @return documentation.

alexpghayes commented 6 years ago

The next question is if the @return documentation can be dealt via @template. The simplest and most work intensive solution is have templates for each potential column to be returned, and add these in via like:

#' @template return_aic
#' @template return_bic

I don't like this because I'd prefer that the column descriptions all live in the same file.

The next option is use @templateVar to specify the columns to generate documentation for. This would result in some really long lines on occasion to specify all the necessary columns.

Another template based option is tracking which tidying methods return which columns in the column_glossary tibble, and just pass the tidier class to a template, which can then pull all the relevant information from the column_glossary. The tidyverse team and Hadley in particular were strongly opposed to this when I talked to them about it on Monday. Additional, I'm convinced that the column_glossary should move to modeltests, in which case it shouldn't have broom specific information.

At this point, it doesn't seem like there's any especially great option. That said, the current documentation is a mess and I don't like the idea of leaving it as is either.

dgrtwo commented 6 years ago

How about named arguments in the ... of return_tidy can define additional columns and their documentation?

#' @evalRd return_tidy("component", regression = TRUE,
#'                     p.value = "The p-value",
#'                     AIC = "blablabla")

This would also give the option of overriding one from e.g. the regression group.

simonpcouch commented 4 years ago

This has been addressed, as far as I'm aware!

github-actions[bot] commented 3 years ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.