rstudio / htmltools

Tools for HTML generation and output
https://rstudio.github.io/htmltools/
213 stars 67 forks source link

Improve performance and memory for `tagWrite()` #413

Open mgirlich opened 9 months ago

mgirlich commented 9 months ago

Printing a bigger gt table is very slow and consumes a lot of memory. Part of this need to be addressed in {gt} but another big part is the tagWrite() function. It would be great to improve the performance.

Example

profvis::profvis(
  ggplot2::diamonds %>% 
    gt::gt() %>% 
    print()
)

image

It also seems that the memory consumption grows more than linear

profvis::profvis({print_diamonds(4e3)})
#  28.6 MB in `renderTags()`
profvis::profvis({print_diamonds(8e3)})
#  103 MB in `renderTags()`
profvis::profvis({print_diamonds(16e3)})
#  346.9 MB in `renderTags()`

There are multiple things that need to be improved

wch commented 9 months ago

If you want to make an attempt at optimizing writing, look at the WSTextWriter "class" in this repository. It is already somewhat optimized, but it's possible can be improved.

IIRC, the WSTextWriter works by appending text to a buffer string that keeps growing. Take a look at this repository where I did some experiments with different ways of accumulating strings: https://github.com/wch/string_builder. (Note that the published test results are with old versions of R, and it would be good to test with new ones.)

In that repo, the fastest method was string_builder_bracket. However, in the current WSTextWriter implementation, I think we're doing a slower method, which is similar to string_builder_paste.

The string_builder_bracket implementation basically collects each string that's passed into it, adding it to a string vector, and the only at the end, when the user calls $get(), does it paste() all the strings in the vector to produce a single string. Note that for WSTextWriter, the implementation will need to be a bit more complex because of the whitespace handling.

mgirlich commented 9 months ago

I looked a little bit into this and there are two big issues: