format_tt mishandles Inf, NaN, and NA

avehtari commented 7 months ago

With plain tt() all as expected

> library(tinytable)
> data.frame(x=1, y=Inf) |> tt()

+---+-----+
| x | y   |
+===+=====+
| 1 | Inf |
+---+-----+ 
> data.frame(x=1, y=NaN) |> tt()

+---+-----+
| x | y   |
+===+=====+
| 1 | NaN |
+---+-----+ 
> data.frame(x=1, y=NA) |> tt()

+---+----+
| x | y  |
+===+====+
| 1 | NA |
+---+----+

with format_tt() not as expected

> data.frame(x=1, y=Inf) |> tt() |> format_tt()
Error in if (any(abs(x - round(x)) > (.Machine$double.eps)^0.5)) return(FALSE) : 
  missing value where TRUE/FALSE needed
> data.frame(x=1, y=NaN) |> tt() |> format_tt()

+---+---+
| x | y |
+===+===+
| 1 |   |
+---+---+ 
> data.frame(x=1, y=NA) |> tt() |> format_tt()

+---+---+
| x | y |
+===+===+
| 1 |   |
+---+---+

Setting some tinytable options cause the same problem with tt()

options(tinytable_format_num_fmt = "significant_cell", tinytable_format_digits = 2, tinytable_tt_digits=2)
> data.frame(x=1, y=Inf) |> tt()
Error in if (any(abs(x - round(x)) > (.Machine$double.eps)^0.5)) return(FALSE) : 
  missing value where TRUE/FALSE needed

vincentarelbundock commented 7 months ago

Thanks a lot for the report. I can confirm that I see the same error in the Inf case only.

To clarify, do you agree that the default behavior of format_tt() should be to not raise a warning or error, and should convert:

NA -> ""
NaN -> ""
Inf -> "Inf"

Keep in mind that the default argument is format_tt(replace_na=""). If you agree, then the fix I just pushed on Github should do the trick:

library(tinytable)

data.frame(x = 1, y = Inf) |> tt() |> format_tt()

x	y
1	Inf


data.frame(x = 1, y = NaN) |> tt() |> format_tt()

x	y
1


data.frame(x = 1, y = NA) |> tt() |> format_tt()

x	y
1

avehtari commented 7 months ago

I had not noticed replace_na="" default. I think it makes sense for NA. I would prefer replacing NaN -> "NaN" as Inf and NaN are both numeric values and possible results from arithmetic operations (part of the IEC 60559 standard) and thus it would be good to have the same behavior for them. It is a bit confusing that is.na(NaN) gives TRUE, while NaN is not a missing value in the sense as NA is a missing value. See also more about this in my suggestion of how to improve R documentation in https://github.com/avehtari/r-source/pull/1

Did you change the behavior on tt() as currently tinytable_tt_digits option changes the behavior of tt()

> data.frame(x=1, y=NaN) |> tt()

+---+-----+
| x | y   |
+===+=====+
| 1 | NaN |
+---+-----+ 
> options(tinytable_tt_digits=2)
> data.frame(x=1, y=NaN) |> tt()

+---+---+
| x | y |
+===+===+
| 1 |   |
+---+---+

vincentarelbundock commented 7 months ago

Thanks for the link. Interesting discussion.

Yes, I had modified the default display of NaN while fixing the bug, but I now agree with you that this is unadvisable. NaN should have the same behavior as Inf, and display the "NaN" characters as a string.

Here's a new proposal:

replace_na argument is deprecated in favor of a more general replace argument. We maintain backward compatibility and raise a warning to indicate the name change.
By default, replace="" and only replaces NA by an empty cell.
Users can supply a different string to replace NA, such as format_tt(replace="-")
Users can also supply a named list to replace different values with different strings.

Examples below.

You can try it by installing the branch in this PR: https://github.com/vincentarelbundock/tinytable/pull/220

What do you think?

library(tinytable)

x <- data.frame(x = 1:5, y = c(pi, NA, NaN, -Inf, Inf))

dict <- list("-" = c(NA, NaN), "-∞" = -Inf, "∞" = Inf)

tt(x) |> format_tt(replace = "-")

    +---+----------+
    | x | y        |
    +===+==========+
    | 1 | 3.141593 |
    +---+----------+
    | 2 |      -   |
    +---+----------+
    | 3 |      NaN |
    +---+----------+
    | 4 |     -Inf |
    +---+----------+
    | 5 |      Inf |
    +---+----------+ 

tt(x) |> format_tt(replace = dict)

    +---+----------+
    | x | y        |
    +===+==========+
    | 1 | 3.141593 |
    +---+----------+
    | 2 | -        |
    +---+----------+
    | 3 | -        |
    +---+----------+
    | 4 | -∞       |
    +---+----------+
    | 5 | ∞        |
    +---+----------+

avehtari commented 7 months ago

I tested running the PR

replace argument looks great and useful
do you plan adding a global option for replace? It would be useful in notebooks with many tables
options(tinytable_tt_digits=2) is still changing the NA behavior of plain tt(), which is an unexpected side effect of setting the number of digits

vincentarelbundock commented 7 months ago

Thanks for trying it out!

I have added global options for all arguments in format_tt().

The digits docs are now clarified and the behavior more consistent. It does require some attention, but I think the new behavior does make some sense:

?tt:

digits: Number of significant digits to keep for numeric variables.
        When ‘digits’ is an integer, ‘tt()’ calls ‘format_tt(x,
        digits = digits)’ before proceeding to draw the table. Note
        that this will apply all default argument values of
        ‘format_tt()’, such as replacing ‘NA’ by "". Users who need
        more control can use the ‘format_tt()’ function instead.

?format_tt:

replace: Logical, String or Named list of vectors
            • TRUE: Replace ‘NA’ by an empty string.
            • FALSE: Print ‘NA’ as the string "NA".
            • String: Replace ‘NA’ entries by the user-supplied string.
            • Named list: Replace matching elements of the vectors in
            the list by theirs names. Example:
                • ‘list("-" = c(NA, NaN), "-∞" = -Inf, "∞" = Inf)’

This means you get:

library(tinytable)

x <- data.frame(x = pi, y = NA)

options(tinytable_tt_digits = 2)
tt(x)

| x   | y   |
|-----|-----|
| 3.1 |     |

options(tinytable_format_replace = "zzz")
tt(x)

| x   | y   |
|-----|-----|
| 3.1 | zzz |

options(tinytable_format_replace = FALSE)
tt(x)

| x   | y   |
|-----|-----|
| 3.1 | NA  |

avehtari commented 7 months ago

I'm fine with this approach. I think the docs are clear. Thanks for making the changes so quickly!

vincentarelbundock / tinytable

format_tt mishandles Inf, NaN, and NA #218