rstudio / pins-r

Pin, Discover and Share Resources
https://pins.rstudio.com
Other
301 stars 62 forks source link

Nested data frames create problematically large HTML previews when published to Connect #816

Open toph-allen opened 6 months ago

toph-allen commented 6 months ago

When publishing a data frame containing a column of nested data frames to Connect, the index.html seems to contain all of the data from the nested columns, resulting in problematically large files. On one actively used Connect instance, we found an index.html file that's over 1 GB.

I've included some sample code below that produces two pins. It produces two pins: one with a very tall data frame, and one with a much smaller nested data frame. I published to Connect to inspect the bundles to compare the relative sizes of the data in the .rds file and the preview index.html file.

It seems like any non-atomic columns should generate preview strings when serialized into the HTML preview.

Sample code

library(datasets)
library(dplyr)
library(magrittr)

board <- pins::board_connect(auth = "envvar")

# Big data frame

beaver_list <- beaver1 %>%
  list %>%
  rep(50000)

beavers <- dplyr::bind_rows(beaver_list)

pins::pin_write(board, beavers, name = "beavers_tall", description = "Beavers Tall")

# Nested data frame

# Just making the nested DF a little larger
wide_beav <- dplyr::bind_cols(beaver_list[1:10])

wide_beaver_list <- wide_beav %>%
  list %>%
  rep(250)

beavers_within_beavers <- data.frame(n = c(1:250))
beavers_within_beavers$beavers <- wide_beaver_list

pins::pin_write(board, beavers_within_beavers, name = "beavers_nested", description = "Beavers Nested")