pharmaverse / tidytlg

The goal of tidytlg is to generate tables, listings, and graphs (TLG) using Tidyverse.
https://pharmaverse.github.io/tidytlg/
Other
33 stars 6 forks source link

resulting rtf is (much) larger after general header-border refactor #38

Closed gmbecker closed 4 months ago

gmbecker commented 4 months ago

@kpagacz

given a particular row in an AE listing, prior to the change we would get the following markup:

\trowd
\trqc \clbrdrt\clbrdrl\clbrdrb\clbrdrr\clvertalt\cellx1235 
\clbrdrt\clbrdrl\clbrdrb\clbrdrr\clvertalt\cellx2382 
\clbrdrt\clbrdrl\clbrdrb\clbrdrr\clvertalt\cellx3528 
\clbrdrt\clbrdrl\clbrdrb\clbrdrr\clvertalt\cellx4675 
\clbrdrt\clbrdrl\clbrdrb\clbrdrr\clvertalt\cellx5822 
\clbrdrt\clbrdrl\clbrdrb\clbrdrr\clvertalt\cellx6968 
\clbrdrt\clbrdrl\clbrdrb\clbrdrr\clvertalt\cellx8115 
\clbrdrt\clbrdrl\clbrdrb\clbrdrr\clvertalt\cellx9261 
\clbrdrt\clbrdrl\clbrdrb\clbrdrr\clvertalt\cellx10408 
\clbrdrt\clbrdrl\clbrdrb\clbrdrr\clvertalt\cellx11555 
\clbrdrt\clbrdrl\clbrdrb\clbrdrr\clvertalt\cellx12701 \pard\intbl\ql\fs16 Apalutamide \cell
\pard\intbl\qc\fs16 56021927PCR3002-syn-10004903 \cell
\pard\intbl\qc\fs16 79 / M / White \cell
\pard\intbl\qc\fs16 10 mg / 37 \cell
\pard\intbl\qc\fs16 Lip swelling / ************ \cell
\pard\intbl\qc\fs16 16JUN2017 (65) \cell
\pard\intbl\qc\fs16 20JUN2017 (69) \cell
\pard\intbl\qc\fs16 5 \cell
\pard\intbl\qc\fs16 Drug Withdrawn \cell
\pard\intbl\qc\fs16 Not Related \cell
\pard\intbl\qc\fs16 Recovered / Resolved / 2 / No \cell
\row

After the change, we get

\trowd
\trqc \clbrdrt\brdrs\brdrw18\clbrdrl\clbrdrb\clbrdrr\clvertalt\clpadfl3\clpadl0 \clpadft3\clpadt0 \clpadfb3\clpadb0 \clpadfr3\clpadr0 \cellx1187 
\clbrdrt\brdrs\brdrw18\clbrdrl\clbrdrb\clbrdrr\clvertalt\clpadfl3\clpadl0 \clpadft3\clpadt0 \clpadfb3\clpadb0 \clpadfr3\clpadr0 \cellx2493 
\clbrdrt\brdrs\brdrw18\clbrdrl\clbrdrb\clbrdrr\clvertalt\clpadfl3\clpadl0 \clpadft3\clpadt0 \clpadfb3\clpadb0 \clpadfr3\clpadr0 \cellx3680 
\clbrdrt\brdrs\brdrw18\clbrdrl\clbrdrb\clbrdrr\clvertalt\clpadfl3\clpadl0 \clpadft3\clpadt0 \clpadfb3\clpadb0 \clpadfr3\clpadr0 \cellx4630 
\clbrdrt\brdrs\brdrw18\clbrdrl\clbrdrb\clbrdrr\clvertalt\clpadfl3\clpadl0 \clpadft3\clpadt0 \clpadfb3\clpadb0 \clpadfr3\clpadr0 \cellx6054 
\clbrdrt\brdrs\brdrw18\clbrdrl\clbrdrb\clbrdrr\clvertalt\clpadfl3\clpadl0 \clpadft3\clpadt0 \clpadfb3\clpadb0 \clpadfr3\clpadr0 \cellx7360 
\clbrdrt\brdrs\brdrw18\clbrdrl\clbrdrb\clbrdrr\clvertalt\clpadfl3\clpadl0 \clpadft3\clpadt0 \clpadfb3\clpadb0 \clpadfr3\clpadr0 \cellx8666 
\clbrdrt\brdrs\brdrw18\clbrdrl\clbrdrb\clbrdrr\clvertalt\clpadfl3\clpadl0 \clpadft3\clpadt0 \clpadfb3\clpadb0 \clpadfr3\clpadr0 \cellx9496 
\clbrdrt\brdrs\brdrw18\clbrdrl\clbrdrb\clbrdrr\clvertalt\clpadfl3\clpadl0 \clpadft3\clpadt0 \clpadfb3\clpadb0 \clpadfr3\clpadr0 \cellx10565 
\clbrdrt\brdrs\brdrw18\clbrdrl\clbrdrb\clbrdrr\clvertalt\clpadfl3\clpadl0 \clpadft3\clpadt0 \clpadfb3\clpadb0 \clpadfr3\clpadr0 \cellx11633 
\clbrdrt\brdrs\brdrw18\clbrdrl\clbrdrb\clbrdrr\clvertalt\clpadfl3\clpadl0 \clpadft3\clpadt0 \clpadfb3\clpadb0 \clpadfr3\clpadr0 \cellx12701 \pard\intbl\ql\fs16 Apalutamide \cell
\pard\intbl\qc\fs16 56021927PCR3002-syn-10004903 \cell
\pard\intbl\qc\fs16 79 / M / White \cell
\pard\intbl\qc\fs16 10 mg / 37 \cell
\pard\intbl\qc\fs16 Lip swelling / ************ \cell
\pard\intbl\qc\fs16 16JUN2017 (65) \cell
\pard\intbl\qc\fs16 20JUN2017 (69) \cell
\pard\intbl\qc\fs16 5 \cell
\pard\intbl\qc\fs16 Drug Withdrawn \cell
\pard\intbl\qc\fs16 Not Related \cell
\pard\intbl\qc\fs16 Recovered/Resolved / 2 / No \cell
\row

Note these render the same (up to possible minor differences in column width) but the number of characters in the markup is nearly twice as large. Because listings have tons of cells, and this is a change in the individual cell markup for each cell, this change dominates everything else, resulting in a file size nearly twice as large as it was (~10000kb -> ~19000 kb)

Can we get the changes to the (body) cell markup reverted while retaining the new behavior for the header?

kpagacz commented 4 months ago

I can remove the noops, but I will need to retain the non-zero padding.

Btw - why is the file size an issue?

gmbecker commented 4 months ago

For listings we're talking tens or hundreds of thousands of rows, each of which will have around a dozen cells.

For lsfae01 on our test data, the listing generated was 20mb, which is up in the range that ms word starts to choke depending on system resources of the user's machine