vincentarelbundock / tinytable

Simple and Customizable Tables in `R`
https://vincentarelbundock.github.io/tinytable
GNU General Public License v3.0
210 stars 18 forks source link

Allow setting significant digits by cell #142

Closed avehtari closed 8 months ago

avehtari commented 8 months ago

I've found tinytable to be very nice, but encountered couple issue. Here's the second one

Currently

> data.frame(x=c(0.000123456789, 12.4356789)) |> tt(digits=2)
+----------+
| x        |
+==========+
|  0.00012 |
+----------+
| 12.43568 |
+----------+

where the second row shows 7 significant digits instead of requested 2. I wish I could get formatting as

+----------+
| x        |
+==========+
|  0.00012 |
+----------+
| 12.      |
+----------+
vincentarelbundock commented 8 months ago

Thanks. I’ve learned a lot from your work, so it's nice to hear that this package could be useful to you.

Currently, we follow the R defaults from format(): When formatting a vector, format() finds the minimum number of decimal places to ensure that all elements have at least digits significant digits, and then displays every number with the same number of decimals. This means we can get different results when feeding a vector or single numbers:

x = c(0.000123456789, 12.4356789)

format(x[1], digits = 2, scientific = FALSE)
# [1] "0.00012"

format(x[2], digits = 2, scientific = FALSE)
# [1] "12"

format(x, digits = 2, scientific = FALSE)
# [1] " 0.00012" "12.43568"

Usually, I like to follow R’s default behavior to avoid surprises. In this case, I admit that I struggle to understand the motivation behind the default. What do you think, should the tinytable default be this?

sapply(x, format, digits = 2, scientific = FALSE)
# [1] "0.00012" "12"
avehtari commented 8 months ago

I would prefer seeing # [1] "0.00012" "12", but even if the default would not change, I would be happy if there were an argument and possibly a global option, to get my preferred output.

My use case is posterior summarise, e.g. image

Here we know that most of the time the default number of posterior draws gives us less than two significant digits accuracy, and thus extra digits are just noise and make it harder to read the table. By default I would use digits=2 and in the above table I sued digits=1 just because there we too many digits already. The proposed behavior would be perfect for this use case.

A complete notebook in which I've tested using tinytable is https://avehtari.github.io/BDA_R_demos/demos_rstan/brms_demo.html

kylebutts commented 8 months ago

I think this is possible with format_tt and the fn argument.

library(tinytable)
data.frame(x=c(0.000123456789, 12.4356789)) |> 
  format_tt(fn = \(x) signif(x, 2))
#>         x
#> 1 0.00012
#> 2      12

BTW, this might also be useful: https://search.r-project.org/CRAN/refmans/dreamerr/html/fsignif.html

vincentarelbundock commented 8 months ago

Thanks @kylebutts, this is a nice workaround.

@avehtari, I pushed three changes to make this easier:

  1. format_tt() gains a new rounding style: num_fmt="significant_cell"
  2. Numeric formatting arguments in format_tt() are now settable using global options.
  3. Added an example to the website tutorial.

With Github main, you can now do:

library(tinytable)
k <- data.frame(x = c(0.000123456789, 12.4356789))

# column-wise significant
tt(k, digits = 2)
x
0.00012
12.43568

# cell-wise significant
tt(k) |> format_tt(digits = 2, num_fmt = "significant_cell")
x
0.00012
12

# cell-wise significant with global option
options("tinytable_format_num_fmt" = "significant_cell")
tt(k, digits = 2)
x
0.00012
12
vincentarelbundock commented 8 months ago

Closing now to cleanup the repo, but feel free to keep the conversation going if the current dev version does not meet your needs.

avehtari commented 8 months ago

Looks great! Two issues

And one minor issue

vincentarelbundock commented 8 months ago

@avehtari,

it seems the global option for the number of digits is now options(digits = *), but would it be possible to have tinytable specific option for the digits?

I added a global option to tt().

options(“tinytable_format_num_fmt” = “significant_cell”) seems to work only if the digits argument is used explicitly, that is, * |> tt(digits=2) uses significant_cell, but * |> tt() does not

I believe that the solution above solves this one too:

library(tinytable)
k <- data.frame(x = c(0.000123456789, 12.4356789))
options("tinytable_tt_digits" = 2)
options("tinytable_format_num_fmt" = "significant_cell")
tt(k)
x
0.00012
12

tidyverse global options seem to use dot . instead of underscore _. after the package name in the global options. Was it intentional to use underscore for tinytable options?

Yes, I personally dislike variable names and options with ., because they look too much like S3 methods. This is obviously an aesthetic preference, but I’m trying to be snake-case consistent across the package.

avehtari commented 8 months ago

Great! The global options now work, and most tables look good without additional options, and occasional format_tt(num_fmt="decimal") is sufficient. With this change, I'm definitely going to use tinytable more often

If you prefer, next time I can also test changes in PR before you merge to main

vincentarelbundock commented 8 months ago

Great news.

This is still a baby package, and I'm very eager to improve it, so don't hesitate to file other feature requests as you think of them.

And yes, for non-trivial features, we should definitely follow saner development procedures like testing in branch.