Open bzkrouse opened 7 years ago
I so hear this. I think the formatting aspect of summary tables in R is quite tedious and a barrier to winning people over from Excel for routine analyses.
I took a crack at this with the janitor package, specifically creating tabulations and 2-way/crosstab/contingency tables and formatting them with percentages, rounding, etc. for quick publication. Though I have focused on simple counting and percentages, not any statistics; but maybe the formatting aspect could be leveraged?
I'm rethinking the approach to janitor's tabulations and formatting, making the functions more modular and coherent and less a set of utilities. If this comes kind of close, maybe something could be built into janitor or those functions or ideas could be extended. Or if it should be something separate, that's great too and I'd love to help ⛏
To help generating nice looking table with grouping factors, I wrote a package called kableExtra
a few months ago. So basically you can do something like below in a pdf_document
library(dplyr)
library(knitr)
library(kableExtra)
library(ezsummary)
mtcars %>%
group_by(cyl) %>%
ezsummary(flavor = "wide") %>%
kable(format = "latex", booktabs = T,
col.names = c("variable", rep(c("mean", "sd"), 3))) %>%
add_header_above(c(" ", "4 cyl" = 2, "6 cyl" = 2, "8 cyl" = 2))
Then you will get something like
(ezsummary
is something I wrote in the past but some design aspects of it is a little below my expectation but I still use it sometimes. :P )
This is a great idea!
Creating these tables is something I find so frustrating when writing a paper or report. It's totally one of those things where I've just gone:
Bah! I'll just write in the values manually just this once
Except it's almost never just this once, and it adds to reproducibility hell.
Having a tool(s) that makes it easier to create these sorts of tables would for sure ease one a pressure point in reproducibility.
ezsummary
and kableExtra
both look amazing, @haozhu233! I'd love to learn more about ezsummary
and kableExtra
and see if we can develop them further.
I don't know if this might be of interest but I met the guy who built this the other day -I was impressed with the level of docs https://cran.r-project.org/web/packages/pivottabler/index.html
Ah, good to know! It's great to gather all these resources together!
Maybe we can work together on some examples of table we have made for papers/reports, and try all these different methods/pkgs out, and then work out what was great and what could be improved?
Wow, it's nice to hear other people are having similar thoughts (well said @njtierney !). @sfirke and @haozhu233 - really appreciate the tools you've built and the fact that you've already spent so much time thinking about this problem. If any of these tools can be leveraged or extended that would be amazing. It would be great to figure out a way to incorporate more of the statistical/modeling aspect of the analysis. More specifically - in the case of a table that contains many models/test, a potential tool could pair nicely with the purrr workflow.
I will mention the tangram package that came onto my radar yesterday that I don't know much about but seems to have a unique table building model.
@njtierney Great idea! I think this type of "literature review" will be very useful for our community. After that we will have a better understanding of what we have right now and what exactly we need. I can imagine during the unconf, we can easily generate a blog post that @stefaniebutland would like to see. ;)
So many interesting things to work on at the unconf!
Agree with all that this is needed and this thread helps summarize a lot. Just an idea, what about a gallery of tables with the code to prodcue them. Something similar to @haozhu233 example above, but for different typical table types.
I feel like having a gallery thing @jhollist just mentioned will definitely be super helpful. We can also borrow some ideas from the design of ggplot2
that having ggplot()
, which is powerful and customizable, and qplot()
, which is bootstrapping common plot types, at the same time.
These are great ideas! I agree the lit review and gallery concept will both be very helpful and great resources for the broader community. It would be nice to take stock of what tools are out there and what types of tables should be covered. @haozhu233 - yes!! to your idea of structuring like ggplot2
. That sounds ideal for a tool that is meant to be easy to use and easily extensible. Maybe we could try out a paradigm where you start with a simple table and add "layers" of details, complexities, and/or customizations.
This is great, I actually mentioned something like this to @stefaniebutland in my talk with her. It's a huge issue in sociology because we make crosstabs a lot and they are really a pain overall in R especially with multiple variables. This is something I wrote to make crosstab making easier for my students (and me) https://github.com/elinw/lehmansociology/blob/master/R/crosstab.R but the print function is really painful. Even what should be simple frequency tables are hard in base R, this is what we came up with just to illustrate https://github.com/elinw/lehmansociology/blob/master/R/frequency.R. @sfirke I'm going to have a look at janitor!
Wow, kableExtra
, nice!
If we are making a literature review then I think formattable should be in there. And of course tables.
I have a lot of PHP/Web experience and the way tables are handled in R always feels very different to me.
Someone gave a lightning talk at the Seattle useR meetup on this topic a few months ago. He showed a few examples, one of which was tableone
Just saw desctable on my github timeline. It seems to be another good fit for this issue.
Nice, lots of examples :) desctable is really interesting! It seems to focus on ease of process & content than styling (it's my impression that some of these packages seem to emphasize one or the other). I'll throw another one into the mix: arsenal, which gets more into stats and models. (@elinw this may be of interest to you for frequency tables...)
There are also older-school table printing options, like gmodels::CrossTable()
and there's one in Hmisc
(I think summary
?). A literature review of what's out there and how it differs would be a boon to folks navigating all of the options. Makes me think of reviews of a field of products from The WireCutter.
Summary of this thread:
There are lots of existing packages/functions for creating and/or formatting tables of various types. There seems to be a consensus that more work may be needed in this area, but we first need to understand all that is available right now. The great discussion in #78 could inform this process. From there, we can determine what is needed going forward. Potential ideas for the unconf, summarized from discussion above:
1) Perform "lit review" of existing packages
2) Are there improvements to be made? If so, planning the future of tables in R:
I will be following along with the unconf remotely (via slack, issues, twitter, etc.) I'll keep my eye on this and if there is anything I can do remotely, would be happy to do so. If it makes sense we could chat via appear.in (I'm not on skype).
I do really like this idea of "lit reviews" for packages. It feels like a more targeted/granular version of a task view and I think could be very useful. We've had fits and starts of a discussion on what to do with https://github.com/ropensci/maptools. It was intended to be a Task View but we got some push back due to the overlap with the Spatial Task View. Anyway, I think this general idea of targeted reviews could fill the void between "packages useful for a broad area" and "use package X to do Y"
And thanks for the interesting discussion!
On Wed, May 17, 2017 at 3:58 PM, Becca Krouse notifications@github.com wrote:
Summary of this thread:
There are lots of existing packages/functions for creating and/or formatting tables of various types. There seems to be a consensus that more work may be needed in this area, but we first need to understand all that is available right now. The great discussion in #78 https://github.com/ropensci/unconf17/issues/78 could inform this process. From there, we can determine what is needed going forward. Potential ideas for the unconf, summarized from discussion above:
1.
Perform "lit review" of existing packages
perform as a case study for #78 https://github.com/ropensci/unconf17/issues/78
- compare existing packages by trying them out on a set of common table types
- create a gallery of tables with the code to produce them
- create blog post
- present results of lit review to benefit the community
- reference WireCutter http://thewirecutter.com/leaderboard/headphones/ for ideas 2.
Are there improvements to be made? If so, planning the future of tables in R:
- would be informed by the lit review
- consider extend existing packages or creating a new one
- borrow ideas from ggplot2
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ropensci/unconf17/issues/69#issuecomment-302214624, or mute the thread https://github.com/notifications/unsubscribe-auth/AFL8S2H6BRu-LZf7u0gHgMQVBTr4ASejks5r61FhgaJpZM4NP4C8 .
-- Jeff W. Hollister email: jeff.w.hollister@gmail.com cell: 401 556 4087
In case it wasn't in your list I just saw this https://gdemin.github.io/expss/
I was digging into the huxtable docs and found a vignette which compares the features of many table making packages: https://cran.r-project.org/web/packages/huxtable/vignettes/design-principles.html
I work for the Federal Reserve Board (FRB). My duties include reading data from various sources (including pdf, excel, xml), processing these data, recompile and produce tables and charts Latex and FAME for publication purposes. I am currently searching for a similar tool(s) in R to replicate these processes (including creating tables). Very interested in this topic.
This issue reminds me a little of this silly joke flowchart i made. But perhaps a (slightly) more serious flowchart would be helpful to people?
I also wonder if it would be possible to create some kind of DSL for making tables that works with the pipe operator? Similar to @haozhu233 's suggestion to use something like a ggplot2
syntax
A grammar of tables w/ modular piping functions, ala ggplot2, would be wonderful. I have been stumbling toward something similar, though in a limited use case (simple one-way and two-way tabulations) - so far I have (on a dev branch):
library(janitor)
mtcars %>%
crosstab(cyl, am) %>%
adorn_totals("row") %>%
adorn_percentages("row") %>%
adorn_pct_formatting() %>%
adorn_ns()
#> cyl 0 1
#> 1 4 27.3% (3) 72.7% (8)
#> 2 6 57.1% (4) 42.9% (3)
#> 3 8 85.7% (12) 14.3% (2)
#> 4 Total 59.4% (19) 40.6% (13)
But this is hardly a grammar - just a vote of enthusiasm for going in that direction 😀
dplyr::case_when()
might be a good model for table formatting. Maybe something like this:
mtcars %>%
group_by(cyl) %>%
summarize(
n = n(),
price = 10000 * wt,
percent_wt = wt / sum(wt)) %>%
format(
n ~ 'comma',
price ~ 'euro',
percent_wt ~ 'percent')
@GShotwell I like that idea a lot, in a strange way it's like tables
but more like normal language. It would be great if there were a dplyr n() function (with some other name) that is a real function, I always end up needing n in doing calculations when creating tables.
"Combining the two issues, we set out to to create a guide that could help users navigate package selection, using the case of reproducible tables as a case study."
Repo: https://github.com/ropenscilabs/packagemetrics Blog post: packagemetrics - Helping you choose a package since runconf17
In my work (clinical research), we make a lot of tables, usually comparing 2 or more groups. It's nice to format the table programmatically so that it is reproducible and ready for publication. The process to do so usually looks something like this:
With tidy tools like dplyr, broom, and purrr, it is easier than ever before to create the self-contained data frame. However, getting all the necessary pieces and working the df into a table-ready format is a process that seems to be recreated from scratch each time. It would be great to have a tool that helps to automate this process a bit. Here's some vague thoughts on what this could look like:
Does anyone have any interest or thoughts about this topic? Are there any tools already out there that help with this? If not an unconf project, would love a related discussion about people’s workflows!