vincentarelbundock / tinytable

Simple and Customizable Tables in `R`
https://vincentarelbundock.github.io/tinytable
GNU General Public License v3.0
211 stars 18 forks source link

format_tt mishandles Inf, NaN, and NA #218

Closed avehtari closed 7 months ago

avehtari commented 7 months ago

With plain tt() all as expected

> library(tinytable)
> data.frame(x=1, y=Inf) |> tt()

+---+-----+
| x | y   |
+===+=====+
| 1 | Inf |
+---+-----+ 
> data.frame(x=1, y=NaN) |> tt()

+---+-----+
| x | y   |
+===+=====+
| 1 | NaN |
+---+-----+ 
> data.frame(x=1, y=NA) |> tt()

+---+----+
| x | y  |
+===+====+
| 1 | NA |
+---+----+ 

with format_tt() not as expected

> data.frame(x=1, y=Inf) |> tt() |> format_tt()
Error in if (any(abs(x - round(x)) > (.Machine$double.eps)^0.5)) return(FALSE) : 
  missing value where TRUE/FALSE needed
> data.frame(x=1, y=NaN) |> tt() |> format_tt()

+---+---+
| x | y |
+===+===+
| 1 |   |
+---+---+ 
> data.frame(x=1, y=NA) |> tt() |> format_tt()

+---+---+
| x | y |
+===+===+
| 1 |   |
+---+---+ 

Setting some tinytable options cause the same problem with tt()

options(tinytable_format_num_fmt = "significant_cell", tinytable_format_digits = 2, tinytable_tt_digits=2)
> data.frame(x=1, y=Inf) |> tt()
Error in if (any(abs(x - round(x)) > (.Machine$double.eps)^0.5)) return(FALSE) : 
  missing value where TRUE/FALSE needed
vincentarelbundock commented 7 months ago

Thanks a lot for the report. I can confirm that I see the same error in the Inf case only.

To clarify, do you agree that the default behavior of format_tt() should be to not raise a warning or error, and should convert:

Keep in mind that the default argument is format_tt(replace_na=""). If you agree, then the fix I just pushed on Github should do the trick:

library(tinytable)

data.frame(x = 1, y = Inf) |> tt() |> format_tt()
x y
1 Inf

data.frame(x = 1, y = NaN) |> tt() |> format_tt()
x y
1

data.frame(x = 1, y = NA) |> tt() |> format_tt()
x y
1
avehtari commented 7 months ago

I had not noticed replace_na="" default. I think it makes sense for NA. I would prefer replacing NaN -> "NaN" as Inf and NaN are both numeric values and possible results from arithmetic operations (part of the IEC 60559 standard) and thus it would be good to have the same behavior for them. It is a bit confusing that is.na(NaN) gives TRUE, while NaN is not a missing value in the sense as NA is a missing value. See also more about this in my suggestion of how to improve R documentation in https://github.com/avehtari/r-source/pull/1

Did you change the behavior on tt() as currently tinytable_tt_digits option changes the behavior of tt()

> data.frame(x=1, y=NaN) |> tt()

+---+-----+
| x | y   |
+===+=====+
| 1 | NaN |
+---+-----+ 
> options(tinytable_tt_digits=2)
> data.frame(x=1, y=NaN) |> tt()

+---+---+
| x | y |
+===+===+
| 1 |   |
+---+---+ 
vincentarelbundock commented 7 months ago

Thanks for the link. Interesting discussion.

Yes, I had modified the default display of NaN while fixing the bug, but I now agree with you that this is unadvisable. NaN should have the same behavior as Inf, and display the "NaN" characters as a string.

Here's a new proposal:

Examples below.

You can try it by installing the branch in this PR: https://github.com/vincentarelbundock/tinytable/pull/220

What do you think?

library(tinytable)

x <- data.frame(x = 1:5, y = c(pi, NA, NaN, -Inf, Inf))

dict <- list("-" = c(NA, NaN), "-∞" = -Inf, "∞" = Inf)

tt(x) |> format_tt(replace = "-")

    +---+----------+
    | x | y        |
    +===+==========+
    | 1 | 3.141593 |
    +---+----------+
    | 2 |      -   |
    +---+----------+
    | 3 |      NaN |
    +---+----------+
    | 4 |     -Inf |
    +---+----------+
    | 5 |      Inf |
    +---+----------+ 

tt(x) |> format_tt(replace = dict)

    +---+----------+
    | x | y        |
    +===+==========+
    | 1 | 3.141593 |
    +---+----------+
    | 2 | -        |
    +---+----------+
    | 3 | -        |
    +---+----------+
    | 4 | -∞       |
    +---+----------+
    | 5 | ∞        |
    +---+----------+ 
avehtari commented 7 months ago

I tested running the PR

vincentarelbundock commented 7 months ago

Thanks for trying it out!

I have added global options for all arguments in format_tt().

The digits docs are now clarified and the behavior more consistent. It does require some attention, but I think the new behavior does make some sense:

?tt:

digits: Number of significant digits to keep for numeric variables.
        When ‘digits’ is an integer, ‘tt()’ calls ‘format_tt(x,
        digits = digits)’ before proceeding to draw the table. Note
        that this will apply all default argument values of
        ‘format_tt()’, such as replacing ‘NA’ by "". Users who need
        more control can use the ‘format_tt()’ function instead.

?format_tt:

replace: Logical, String or Named list of vectors
            • TRUE: Replace ‘NA’ by an empty string.
            • FALSE: Print ‘NA’ as the string "NA".
            • String: Replace ‘NA’ entries by the user-supplied string.
            • Named list: Replace matching elements of the vectors in
            the list by theirs names. Example:
                • ‘list("-" = c(NA, NaN), "-∞" = -Inf, "∞" = Inf)’

This means you get:

library(tinytable)

x <- data.frame(x = pi, y = NA)

options(tinytable_tt_digits = 2)
tt(x)

| x   | y   |
|-----|-----|
| 3.1 |     |

options(tinytable_format_replace = "zzz")
tt(x)

| x   | y   |
|-----|-----|
| 3.1 | zzz |

options(tinytable_format_replace = FALSE)
tt(x)

| x   | y   |
|-----|-----|
| 3.1 | NA  |
avehtari commented 7 months ago

I'm fine with this approach. I think the docs are clear. Thanks for making the changes so quickly!