ycphs / openxlsx

openxlsx - a fast way to read and write complex xslx files
https://ycphs.github.io/openxlsx/
Other
220 stars 73 forks source link

openxlsx: bad interactions between writing data tables and images #463

Open Arthfael opened 5 months ago

Arthfael commented 5 months ago

Background: I am slowly starting to use openxlsx2 instead of openxlsx, because openxlsx does not support some features I need (specifically, multiple text colors in a single cell). For practical reasons, and because styling is so easy in openxlsx (but so %@~#!$£?!!! in openxlsx2), I was hoping that I could perform some steps in openxlsx, save the workbook then re-open in openxlsx2. This has caused some issues with data tables and embedded images, which I had written into the wb using openxlsx. As I have investigated this, I have realized that these are issues with how openxlsx writes data tables and images: in some cases, an image will not be written even though it should be allowed at the location by Excel, if a data table is in the same workbook. This does not happen for all cells, the issue tends to only affect cells to the right (and below? not sure...) of the table. Also, adding the offending table after, not before the image, removes the image. See reprex below (openxlsx2 parts included because the errors and warnings might reveal some info about the issues):

remotes::install_github("ycphs/openxlsx")
remotes::install_github("JanMarvin/openxlsx2")
packageVersion("openxlsx")
packageVersion("openxlsx2")

# openxlsx phase
require(openxlsx)
df <- data.frame(A = 1:10, B = paste0("B", 1:10))
wb <- createWorkbook()
addWorksheet(wb, "Test")
writeDataTable(wb, "Test", df)
fl <- openxlsx::temp_xlsx(name = "temp_xlsx")
saveWorkbook(wb, fl, overwrite = TRUE)
openXL(fl)
#
# openxlsx2 phase
require(openxlsx2)
img <- system.file("extdata", "einstein.jpg", package = "openxlsx2")
wb2 <- wb_load(fl)
fl2 <- openxlsx2::temp_xlsx(name = "temp_xlsx", macros = FALSE)
wb_save(wb2, fl2)
# First issue:
# Warning message:
#  In wb$clone()$save(file = file, overwrite = overwrite, path = path) :
#  [worksheets] file expected to be in output is missing: ../drawings/vmlDrawing1.vml
#
dms <- wb_dims(1, 1)
wb2 <- wb_add_image(wb2, "Test", dims = dms, img)
# This throws error "Error in self$drawings[[sheet_drawing]] : subscript out of bounds"
dms <- wb_dims(15, 15)
wb2 <- wb_add_image(wb2, "Test", dims = dms, img)
# This too - this is not an issue with overlap with the table
wb2 <- wb_add_worksheet(wb2, "Test2")
dms <- wb_dims(1, 1)
wb2 <- wb_add_image(wb2, "Test2", dims = dms, img)
# Now the above works!
dms <- wb_dims(2, 2)
wb2 <- wb_add_image(wb2, "Test2", dims = dms, img)
# (and this too)
wb_save(wb2, fl2)
xl_open(fl2)
#
# First I thought openxlsx2 had a problem, but actually it seems it is openxlsx-related:
insertImage(wb, "Test", img) # No error, but...
saveWorkbook(wb, fl, overwrite = TRUE)
openXL(fl) # ... no image!
insertImage(wb, "Test", img, startRow = 15)
saveWorkbook(wb, fl, overwrite = TRUE)
openXL(fl) # ... no image!
insertImage(wb, "Test", img, startCol = 15)
saveWorkbook(wb, fl, overwrite = TRUE)
openXL(fl) # ... no image!
# In the past with openxlsx only I did not notice the problem, but I was embedding images first, then data tables:
removeWorksheet(wb, "Test") # Just remove the tabs, no need to start from a fresh workbook
addWorksheet(wb, "Test")
writeDataTable(wb, "Test", df, 10)
writeDataTable(wb, "Test", df, 15)
insertImage(wb, "Test", img)
insertImage(wb, "Test", img, startRow = 15)
insertImage(wb, "Test", img, startCol = 25)
saveWorkbook(wb, fl, overwrite = TRUE)
openXL(fl) # My first 2 images are here, not the 3rd
#
# I cannot show every combination, but at this stage it seems that it's not the order in which I embed
# images/data tables which matters, but whether images are in a space allowed.
# As long as there are tables in the workflow, any images overlapping with them or to their right (how far?)
# seem to be disallowed or removed.
#
# This limitation is not present in Excel.
#
# What about trying the same thing with openxlsx2?
wb2 <- wb_workbook()$
  add_worksheet()$
  add_data_table(x = df, start_col = 10)$
  add_data_table(x = df, start_col = 15)$
  add_image(file = img)$
  add_image(file = img, start_row = 15)$
  add_image(file = img, start_col = 25)$
  save(fl2)
xl_open(fl2)
# This works better than openxlsx but not perfectly:
# The 3 images are added, but all 3 are stacked in A1!
wb2 <- wb_workbook()$
  add_worksheet()$
  add_data_table(x = df, start_col = 10)$
  add_data_table(x = df, start_col = 15)$
  add_image(file = img)$
  add_image(file = img, dims = "A15")$
  add_image(file = img, dims = "Y1")$
  save(fl2)
xl_open(fl2)
# Now my images are where I want them.
# Changing the order works too:
wb2 <- wb_workbook()$
  add_worksheet()$
  add_image(file = img)$
  add_image(file = img, dims = "A15")$
  add_image(file = img, dims = "Y1")$
  add_data_table(x = df, start_col = 10)$
  add_data_table(x = df, start_col = 15)$
  save(fl2)
xl_open(fl2)
# Conclusions:
# The main issue is with openxlsx, though there are some issues with start_row/start_col being ignored in openxlsx2.
# The way I read this, openxlsx seems to introduce small issues in the excel sheet, which are ignored by Excel,
# but cause issues when the worksheet is loaded into openxlsx2.

Details about my session:

> sessionInfo()
R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server 2022 x64 (build 20348)

Matrix products: default

locale:
[1] LC_COLLATE=English_United Kingdom.utf8  LC_CTYPE=English_United Kingdom.utf8   
[3] LC_MONETARY=English_United Kingdom.utf8 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.utf8    

time zone: Europe/Vienna
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] openxlsx2_1.4       openxlsx_4.2.5.9001

loaded via a namespace (and not attached):
 [1] desc_1.4.3        R6_2.5.1          magrittr_2.0.3    remotes_2.4.2.1   ps_1.7.5          cli_3.6.2        
 [7] zip_2.3.1         processx_3.8.3    callr_3.7.3       compiler_4.3.2    rstudioapi_0.15.0 tools_4.3.2      
[13] pkgbuild_1.4.3    curl_5.2.0        Rcpp_1.0.12       stringi_1.8.3