haozhu233 / kableExtra

Construct Complex Table with knitr::kable() + pipe.
https://haozhu233.github.io/kableExtra/
Other
688 stars 147 forks source link

Column_Spec time consuming for large data frame #817

Open FrankYang1995 opened 7 months ago

FrankYang1995 commented 7 months ago

Hi, loving this package. I found Column_spec function will take very long time to execute if the kable is a large data frame. I have a data.frame with 10k rows, I want to covert it to PDF. When I use the Column_spec to change the column width, it seems iterate all rows? I tried to change column width with 100 rows. It runs very fast. With 10K rows, it takes hours. The reason to change column width is I have 10+ columns, without changing the column width. 1 page is not enough to contain all information.

Did I do something wrong? Or KabelExtra is not meant for large data set.

dmurdoch commented 7 months ago

Could you show us some code that generates a table like the one you're working with, and then runs your formatting instructions on it, i.e. a reproducible example? Certainly 10k rows in a table is unusual (your PDF will have hundreds of pages), but perhaps if we have a working example we can spot where things could be improved.

haozhu233 commented 7 months ago

Yeah, some of the codes there are probably not efficient enough and sometimes it was determined by the machanism of this package. Here is an example. I will take a deeper look at it later.

library(kableExtra)
library(dplyr)

big_mtcars = list()
for (i in 1:100) {
  big_mtcars[[i]] = mtcars
}
big_mtcars = bind_rows(big_mtcars)

aaa = kbl(big_mtcars, 'html') %>%
  column_spec(1, width='2in')

Anyway, my honest opinion is that when you have that many rows, you should think about using a different way to present the data (e.g. plotting or provide some kind of summary or reduction). Table, as one of the final representing methods, should only contain some distilled information. Going through a 100-page table is just not so fun.

FrankYang1995 commented 6 months ago

Hi Sorry for the delay. Here is the code I used:

kable <- kable( x = data, format = "latex", align = "c", caption = "", escape = FALSE, booktabs = TRUE, longtable = TRUE, linesep = "")

kable <- column_spec(kabel,1 ,width = "10") # This line costs a lot of time

I know there are too many rows. But for some reason, I really need to lists all data this way. Do you have any other suggestions to do this work? I need to list thousands of line into PDF/Word.

Thanks for helping

haozhu233 commented 6 months ago
out <- kable(
x = data,
format = "latex",
align = c('>{\\raggedright\\arraybackslash}p{10cm}', 'c', 'c'), # depending on how many columns you have and whether you have row names
caption = "",
escape = FALSE,
booktabs = TRUE,
longtable = TRUE,
linesep = "")