Open mgirlich opened 9 months ago
If you want to make an attempt at optimizing writing, look at the WSTextWriter
"class" in this repository. It is already somewhat optimized, but it's possible can be improved.
IIRC, the WSTextWriter
works by appending text to a buffer string that keeps growing. Take a look at this repository where I did some experiments with different ways of accumulating strings: https://github.com/wch/string_builder. (Note that the published test results are with old versions of R, and it would be good to test with new ones.)
In that repo, the fastest method was string_builder_bracket
. However, in the current WSTextWriter
implementation, I think we're doing a slower method, which is similar to string_builder_paste
.
The string_builder_bracket
implementation basically collects each string that's passed into it, adding it to a string vector, and the only at the end, when the user calls $get()
, does it paste()
all the strings in the vector to produce a single string. Note that for WSTextWriter
, the implementation will need to be a bit more complex because of the whitespace handling.
I looked a little bit into this and there are two big issues:
doRenderTags()
uses a fixed buffer size of 1024 entries. For many tags together with some attributes this is very little. E.g. I increased the buffer size to 16 * 1024 which decreased the time from 24s to 5s and the memory from ~ 3700 MB to ~ 300 MB.
Printing a bigger gt table is very slow and consumes a lot of memory. Part of this need to be addressed in {gt} but another big part is the
tagWrite()
function. It would be great to improve the performance.Example
It also seems that the memory consumption grows more than linear
There are multiple things that need to be improved
takeSingleton()
takeHeads()
findDependencies()
tagify()