nutterb / pixiedust

Tables So Beautifully Fine-Tuned You Will Believe It's Magic.
180 stars 18 forks source link

Nice inline export for Rmd documents #78

Open simonthelwall opened 7 years ago

simonthelwall commented 7 years ago

I was thinking that it would be great if R had a set of functions for including regression output in body text and was wondering whether you thought this would be a good fit for pixiedust, or whether it would be out of scope for the package?

I imagine it working something like the following:

"Risk of cardiovascular events increased with increasing BMI r sprinkle_text(x, y, form = "odds_ratio", conf.int = TRUE, p = TRUE)...

where x is a regression model and y an independent variable.

The output would look like

"Risk of cardiovascular events increased with increasing BMI (OR: 4.5, 95% CI: 4.1-4.7)..."

What do you think?

nutterb commented 7 years ago

I'm not opposed to the idea, but there are a few questions I would want resolve prior to taking on a project like this. For instance

  1. How should interaction terms be addressed?
  2. How should factor variables be retrieved?
  3. Most model objects return a term column when tidied via broom, but what should happen if there is no term column.
  4. What should be the behavior if there is no confidence interval method for the object? When no p-value is available?
  5. Are there specific formats that should be followed? For example, APA formats? (Ugh)
  6. What should happen if an inappropriate form is requested. For example, if I request an odds ratio with a t.test object.

Questions 1-4 seem like they would need pretty firm answers before committing any serious code effort. 5 and 6 are things I think could become headaches in the future.

Just brainstorming out loud, but would be interested in your thoughts on 1-4.

simonthelwall commented 7 years ago

All excellent points. Some thoughts below.

  1. I had wondered about this. I'll confess that I don't fully understand interactions in R. I think we would want some way to specify the stratum-specific effect (the linear combination).
  2. I think an optional argument specifying the factor level.
  3. I'm not familiar with any model objects that would not return a term, perhaps print a warning in place of the term?
  4. I think CIs and p-values should be optional arguments, defaulting to FALSE. If users then specify an illegal choice an error message should be printed.
  5. Really good point. I think a combination of two options:
    • some built in styles that can be specified by name
    • another function by which a user can specify their own format that will be used universally through the document.
  6. Again, I think print a warning.
nutterb commented 7 years ago

As I've thought about it more, I've decided that this really ought to be a generic for which additional methods may be written. For the generic, I propose the flolling functional requirement's

  1. Accepts an object that may be successfully tidyied.
  2. Returns the error message from tidy when tidy is not successful
  3. Returns any warnings generated by tidy
  4. Accepts a character (1) argument that can determine the output format (overriding other formal arguments)

For now, I would set style = "none" to indicate the formal arguments should be used to determine the format. Other styles, such as APA may follow later.

As an example of the lm method, I would add the following requirements.

  1. Accepts a character vector naming the term to be summarised. A length one vector returns the main effect. A length two vector returns the interaction between two terms, etc.
  2. Return an error if no term exists that satisfies the linear combination.
  3. Accepts a vector or list of characters, optionally named, specifying the level for any factors named in term. If unnamed, the levels are assumed to follow the same order of factors in term.
  4. Returns an error if any levels in level cannot be found in its corresponding term.
  5. Accept a logical (1) indicating if the confidence interval is to be included in the summary
  6. Accept a logical (1) indicating if the SE is to be included in the summary
  7. Accept a logical (1) indicating if the test statistic is to be included in the summary
  8. Accept a logical (1) indicating if the p-value is to be included in the summary
  9. Accept a character(1) designating the text label for the coefficient (beta, OR, etc)
  10. Accept a function by to apply to the coefficient and CI
  11. Accept additional arguments to the transformation function

How would this work for getting started?

ckraner commented 7 years ago

If you are using LaTeX just use knitR. Here is my chi-sq reported values from the lm objects: \({\chi}^2(\Sexpr{PreviousChiSq$df})=\Sexpr{round(PreviousChiSq$dx,2)}, p=\Sexpr{round(PreviousChiSq$chi,24)}\).

It doesn't give you the label for the value, but it's there and easy.

Edit: For percents look at something like this first: http://stackoverflow.com/questions/7145826/how-to-format-a-number-as-percentage-in-r

nutterb commented 7 years ago

Here's a first attempt. How does this look as proof of concept?

use devtools::install_github("nutterb/pixiedust", ref = "new-latex-tables-inline-dust") to install the package with these utilities.

The source code to generate the document displayed below is at https://gist.githubusercontent.com/nutterb/bcc3c04bc4c807cb9753f74820584cf5/raw/dfe78db875de0a314d4e87126ab2cdf5548173d8/dust_inline_example.Rmd

test

simonthelwall commented 7 years ago

I think that works really nicely. One thing I noticed is that the upper confidence interval does not appear to be formateted to two dp. image

simonthelwall commented 7 years ago

I don't know if I was doing something wrong, or whether it was something else, I just had to update R to 3.3.2 and reinstall all my libraries. When trying to install pixiedust as above, I also had to install the packages below, one-by-one.

Formula acepack latticeExtra gridExtra htmlTable data.table

nutterb commented 7 years ago

This sounds like something in the dependency chain. A dependency in one of the dependencies is not being installed. When upgrading R, I would recommend using dependencies =TRUE when using install.packages or any of its devtools variants. You can piece together why by reading about the dependencies argument in install.packages and install_github.