hughjonesd / huxtable

An R package to create styled tables in multiple output formats, with a friendly, modern interface.
http://hughjonesd.github.io/huxtable
Other
321 stars 28 forks source link

Bugs related to calculation of string width in huxtable #213

Closed NikNakk closed 2 years ago

NikNakk commented 3 years ago

I've run into two annoying bugs that relate to the way that the huxtable package measures widths of strings when outputting to the screen. ncharw() is used which in turn calls stringi::str_width. This regards certain characters (e.g. the interpunct, ·, \U00B7) as being width 2. The first bug happens when the width calculated by ncharw() exceeds the value of eff_width but the number of characters does not. This leads to an infinite loop in this section of code in character_matrix(), starting at line 297 in the current version on GitHub:

while (any(ncharw(x) > eff_width)) {
    lx <- length(x)
    last <- x[lx]
    last <- c(substring(last, 1, eff_width), substring(last, eff_width + 1))
    x[lx:(lx + 1)] <- last
}

Reprex of first bug

options(width = 200) # change to 100 to reproduce the bug
my_data <- as.data.frame(rep(list("8·0 (3·2 - 16·0)"), 5))
colnames(my_data) <- paste0("V", 1:ncol(my_data))
huxtable::as_hux(my_data)
                                   V1                    V2                    V3                    V4                    V5                   
                                   8·0 (3·2 - 16·0)      8·0 (3·2 - 16·0)      8·0 (3·2 - 16·0)      8·0 (3·2 - 16·0)      8·0 (3·2 - 16·0)     

Column names: V1, V2, V3, V4, V5

Created on 2021-08-04 by the reprex package (v2.0.0)

Second bug

I've spent less time on the second related bug, but it seems that the padding width in to_screen.huxtable can sometimes end up negative if the first bug isn't triggered, as demonstrated below:

Reprex of second bug

options(width = 100)
my_data <- as.data.frame(rep(list("0·105 - 0·206)"), 10))
colnames(my_data) <- paste0("V", 1:ncol(my_data))
huxtable::as_hux(my_data)
#> Error in strrep(" ", pad_width): invalid 'times' value

Created on 2021-08-04 by the reprex package (v2.0.0)

All of this is in huxtable 0.5.4 on R 4.1.0 under RStudio 1.4.1717.

hughjonesd commented 3 years ago

Thank you for the careful bug report. Will take a look.

hughjonesd commented 2 years ago

I don't get an infinite loop with your first reprex (and width = 100), using github master. I also don't reliably get the strrep error from your second reprex. (I saw it once! Dammit.) Can you check after remotes::install_github("hughjonesd/huxtable")? Thanks.

hughjonesd commented 2 years ago

Note that stringi 1.6.2 gives

> stringi::stri_width("\U00B7")
[1] 1

So maybe the fix is just update stringi?

kforner commented 2 years ago

I've been hit by the second bug too. Maybe for safety replace https://github.com/hughjonesd/huxtable/blob/f60b75c6397b38f1ac36f623f6d5fb76da9f9ae1/R/screen.R#L159 result <- paste0(strrep(" ", pad_width), result) by result <- paste0(strrep(" ", max(0, pad_width)), result)

hughjonesd commented 2 years ago

Have you got a reliable reprex? Can you tell me your stringi version?

kforner commented 2 years ago

Unfortunately I haven't. It currently happens on very big tables, full of custom attributes. I'll try to minimize one if I can. The rationale for my proposal is that, in any case we do not want a negative pad_width, being because of a bug, of a wrong version, or a combination of exotic parameters. So better to be safe. Maybe even better: test its negativity, and send a warning so that it does not go unnoticed.

My stringi version: 1.7.6

kforner commented 2 years ago
    row_char_widths <- rowSums(width_mat)
    pad_width <- min(max_width, getOption("width", 80))
    pad_width <- pad_width - max(row_char_widths)

let's say:

And that's consistent with what I observed: If I unset max_width= with my examples if did not crash any longer.

kforner commented 2 years ago

So here's the reprex:

library(huxtable)
df <- data.frame(col = strrep('A', 200))
ht <- as_hux(df)
to_screen(ht)
##--> OK

# but
to_screen(ht, max_width=200)
Error in strrep(" ", pad_width) : invalid 'times' value

# fixing the `width` option prevents the bug:
withr::with_options(c(width = 200), to_screen(ht, max_width=200))
hughjonesd commented 2 years ago

Thanks! Should be fixed in master.

kforner commented 2 years ago

Thanks, that actually fixed my problems. Tables are just fine now.

hughjonesd commented 2 years ago

Closing for now

NikNakk commented 2 years ago

Sorry for delayed response @hughjonesd. Unfortunately, the first bug remains even in the latest GitHub version. You just have to substitute the interpunct for any other wide character. Here's a reprex that randomly picks wide characters:

library(stringi)
library(huxtable)
options(width = 400) # replace with 100 to reproduce bug
options(huxtable.knitr_output_format = "html")
chars <- sapply(32:65536, intToUtf8)
wide_chars <- chars[sapply(chars, stri_width) == 2]
set.seed(1741)
random_wide_strings <- replicate(5, paste(sample(wide_chars, 20), collapse = ""))
df <- as.data.frame(as.list(random_wide_strings), col.names = paste0("V", 1:5))
huxtable::as_hux(df)
V1 V2 V3 V4 V5
艨㩤NA즨殖믰㜿⼄숓馚䝆⼶弫붶NA鋸堻焪왰㦞 姎㠍僴䪦㬨䄞ꅒ玏䂎븬묌넰顕䟳퉄詈ꑔ텩웼첁 NA༜す顣壡덨懱뤽婾NA슩뽵칧鰻頰勶呝纍턒그 㯇윬蒈驺뼩密隲뷃ク쮵尘첈䒪捕䮉┴佭㪣术NA ꇆ눿㠳讟媠见NA든⚼쩁쿮澩벵갾袢喡NA㮹NA쒎

Created on 2022-02-10 by the reprex package (v2.0.1)

hughjonesd commented 2 years ago

Thanks. OK, I think it is reasonable just to rely on fansi to fix this. I've added a test and the current version seems to pass, can you confirm?