ycphs / openxlsx

openxlsx - a fast way to read and write complex xslx files
https://ycphs.github.io/openxlsx/
Other
226 stars 75 forks source link

loadWorkbook does not retain Headers and Footers that have "escape code" formatting #334

Closed michael-m52 closed 1 year ago

michael-m52 commented 2 years ago

Describe the bug 1) There is an open feature request on another branch to be able to format Headers and Footers: https://github.com/awalker89/openxlsx/issues/369 2) A user has suggested a workaround using "escape codes": https://github.com/awalker89/openxlsx/issues/369#issuecomment-486487602 3) The Microsoft documentation for using "escape code" formatting of headers and footers: Formatting and VBA codes for headers and footers 4) These "escape codes" are not compatible with loadWorkbook, which will import the workbook with blank headers and footers.

To Reproduce

library('openxlsx')
# Plain text headers and footers
header = c('ODD HEAD LEFT', 'ODD HEAD CENTER', 'ODD HEAD RIGHT')
footer = c('ODD FOOT RIGHT', 'ODD FOOT CENTER', 'ODD FOOT RIGHT')
evenHeader = c('EVEN HEAD LEFT', 'EVEN HEAD CENTER', 'EVEN HEAD RIGHT')
evenFooter = c('EVEN FOOT RIGHT', 'EVEN FOOT CENTER', 'EVEN FOOT RIGHT')
firstHeader = c('TOP', 'OF FIRST', 'PAGE')
firstFooter = c('BOTTOM', 'OF FIRST', 'PAGE')

# Add Sheet 1
wb=createWorkbook()
addWorksheet(wb, 'Sheet 1',
             header = header,
             footer = footer,
             evenHeader = evenHeader,
             evenFooter = evenFooter,
             firstHeader = firstHeader,
             firstFooter = firstFooter)

# Modified headers and footers to make them Arial 8
header = paste0('&"Arial"&8',header)
footer = paste0('&"Arial"&8',footer)
evenHeader = paste0('&"Arial"&8',evenHeader)
evenFooter = paste0('&"Arial"&8',evenFooter)
firstHeader = paste0('&"Arial"&8',firstHeader)
firstFooter = paste0('&"Arial"&8',firstFooter)

# Add Sheet 2
addWorksheet(wb, 'Sheet 2',
             header = header,
             footer = footer,
             evenHeader = evenHeader,
             evenFooter = evenFooter,
             firstHeader = firstHeader,
             firstFooter = firstFooter)
writeData(wb, sheet = 1, 1:400)
writeData(wb, sheet = 2, 1:400)

# Save workbook
saveWorkbook(wb,'test_1st_save.xlsx',overwrite = T)

# Load workbook and save again
wb_loadWorkbook=loadWorkbook("test_1st_save.xlsx")
saveWorkbook(wb_loadWorkbook,"test_2nd_save.xlsx",overwrite = T)

Expected behavior Headers and footers 1) test_1st_save.xlsx and test_2nd_save.xlsx should be identical 2) text - identical in both Sheet 1 & Sheet 2 3) formatting - Sheet 1 has Calibri 11 (or whatever the default font is), Sheet 2 has Arial 8.

Actual behavior The workbook test_1st_save.xlsx has the correct headers and footers. The workbook test_2nd_save.xlsx has missing headers and footers from Sheet 2.

Screenshots test_1st_save.xlsx - Sheet 2 image test_2nd_save.xlsx - Sheet 2 image

Example files test_1st_save.xlsx test_2nd_save.xlsx

Additional context To format headers and footers, we either need the original feature request to be implemented or we need a fix to loadWorkbook.

JanMarvin commented 2 years ago

Hi @michael-m52 , thanks for the report. I assume that this will be tricky if this is caused by our XML parser. I'll try to look into this

michael-m52 commented 2 years ago

Thankyou @JanMarvin I appreciate it. I would like to contribute to fixing some of these issues, but I haven't worked out how to debug the openxlsx package from the downloaded source code using RStudio. Is there a reference for me to get started quickly?

JanMarvin commented 2 years ago

Unfortunately there isn't, it is mostly digging into the source code and extracting the xlsx files, carefully inspecting them and consulting the openxml documentation if needed.

I assume that the issue is this function: https://github.com/ycphs/openxlsx/blob/9223f05791cc139e27d158e50ea80677e9ee0d9b/R/helperFunctions.R#L906-L941

It is called here: https://github.com/ycphs/openxlsx/blob/9223f05791cc139e27d158e50ea80677e9ee0d9b/R/loadWorkbook.R#L494-L501

And should create wb$worksheets[[sheet]]$headerFooter (where sheet is a numeric 1 or 2 in your case) when loading the workbook. The split function seems broken. So my approach would be to try to fix/replace this one. It should import the following XML (these are written by openxlsx, but are not imported correctly). And all the "&" are "&" in XML ofc.

<headerFooter differentOddEven="1" differentFirst="1" scaleWithDoc="0" alignWithMargins="0">
  <oddHeader>&LODD HEAD LEFT&CODD HEAD CENTER&RODD HEAD RIGHT</oddHeader>
  <oddFooter>&LODD FOOT RIGHT&CODD FOOT CENTER&RODD FOOT RIGHT</oddFooter>
  <evenHeader>&LEVEN HEAD LEFT&CEVEN HEAD CENTER&REVEN HEAD RIGHT</evenHeader>
  <evenFooter>&LEVEN FOOT RIGHT&CEVEN FOOT CENTER&REVEN FOOT RIGHT</evenFooter>
  <firstHeader>&LTOP&COF FIRST&RPAGE</firstHeader>
  <firstFooter>&LBOTTOM&COF FIRST&RPAGE</firstFooter>
</headerFooter>
<headerFooter differentOddEven="1" differentFirst="1" scaleWithDoc="0" alignWithMargins="0">
  <oddHeader>&L&"Arial"&8ODD HEAD LEFT&C&"Arial"&8ODD HEAD CENTER&R&"Arial"&8ODD HEAD RIGHT</oddHeader>
  <oddFooter>&L&"Arial"&8ODD FOOT RIGHT&C&"Arial"&8ODD FOOT CENTER&R&"Arial"&8ODD FOOT RIGHT</oddFooter>
  <evenHeader>&L&"Arial"&8EVEN HEAD LEFT&C&"Arial"&8EVEN HEAD CENTER&R&"Arial"&8EVEN HEAD RIGHT</evenHeader>
  <evenFooter>&L&"Arial"&8EVEN FOOT RIGHT&C&"Arial"&8EVEN FOOT CENTER&R&"Arial"&8EVEN FOOT RIGHT</evenFooter>
  <firstHeader>&L&"Arial"&8TOP&C&"Arial"&8OF FIRST&R&"Arial"&8PAGE</firstHeader>
  <firstFooter>&L&"Arial"&8BOTTOM&C&"Arial"&8OF FIRST&R&"Arial"&8PAGE</firstFooter>
</headerFooter>
JanMarvin commented 2 years ago

FYI I have a fix pending for an upcoming new release, but there is no ETA yet other than 2022h1. Maybe it can be backported like this (untested, just typed it into the github reply field):

      # as replacement for L494 in loadWorkbook.R
        amp_split <- function(x) {
          z <- stri_split_regex(x, "&amp;[LCR]")
          z <- unlist(z)
          z[-1]
        }

        head_foot <- c("oddHeader", "oddFooter",
                       "evenHeader", "evenFooter",
                       "firstHeader", "firstFooter")

        headerFooter <- vector("list", length = length(head_foot))
        names(headerFooter) <- head_foot

        for (hf in head_foot) {
          headerFooter[[hf]] <- amp_split(getChildlessNode(wb$worksheets[[i]]$headerFooter, hf))
        }

        wb$worksheets[[i]]$headerFooter <- headerFooter
JanMarvin commented 2 years ago

I've pushed my idea to the development branch. Please have a look if this solves your issue. Headers and Footers are of minor interest to me, I can not remember ever using them or the last time I printed something from Excel, therefore testing this would be very welcome :)

michael-m52 commented 2 years ago

Thanks again @JanMarvin, I will test the development branch to see if this works now.

malte610 commented 2 years ago

First of all, thank you for your efforts here! I highly appreciate everyones work on this package.

When I tested it, it unfortunately didn't work for me.

When I put text or a logo in the header I receive on saving the following error:

"Error in x$oddHeader[[2]] : subscript out of bounds"

When I have an empty header and only text in the footer, the text in the footer is lost on saving.

This is the template I am testing with (don't judge, not my work): template_test.xlsx

This was my test:

remotes::install_github("ycphs/openxlsx", ref = "development")
library(openxlsx)

wb= loadWorkbook(xlsxFile = "path")
writeData(wb, sheet = 1, "testing", startCol = "A", startRow = 7)
saveWorkbook(wb, "path", overwrite = TRUE)

openXL("path")

I hope this helps to pinpoint the issue.

JanMarvin commented 2 years ago

I guess that I can speak for everyone on the whole internet: this is an invitation to judge your work, @malte610 😄

The issue is, that my fix was not sufficient. You only have center and right. Unfortunately I did not check this and assumed that Excel will always provide three fields.

wb <- loadWorkbook("path.xlsx")

odd <- wb$worksheets[[1]]$headerFooter$oddHeader

odd[[3]] <- odd[[2]]
odd[[2]] <- odd[[1]]
odd[[1]] <- ""

wb$worksheets[[1]]$headerFooter$oddHeader <- odd

saveWorkbook(wb, "test.xlsx", TRUE)
openXL("test.xlsx")

No clue why the image starts floating, might require further debugging. Though first the split function should be fixed ...

malte610 commented 2 years ago

damn. I didn't think this through.. Go ahead, judge me all you want, it only hurts a tiny bit 😥😅

I wish I could help you with this, but my knowledge of the insides of xlsx files is very limited. Let me know if I can contribute here.

JanMarvin commented 2 years ago

I've fixed the amp_split function in development. Now looking into the image part. I assume it's caused by us picking up an image and simply assigning it to the sheet

michael-m52 commented 2 years ago

Hi @JanMarvin

Once again, I really appreciate your efforts. If I could get the hang of GitHub I'd contribute myself so I'm so happy that somebody else is doing this.

Your latest change fixes the original bug here, 100%. It does however create another issue. In my test case above, please substitute the first 8 lines with this:

library('openxlsx')
# Plain text headers and footers
header = c('ODD HEAD LEFT', 'ODD HEAD CENTER', 'ODD HEAD RIGHT')
footer = c('ODD FOOT RIGHT', 'ODD FOOT CENTER', 'ODD FOOT RIGHT')
evenHeader = c('EVEN HEAD LEFT', 'EVEN HEAD CENTER', 'EVEN HEAD RIGHT')
evenFooter = c('EVEN FOOT RIGHT', 'EVEN FOOT CENTER', 'EVEN FOOT RIGHT')
firstHeader = c('', 'OF FIRST', '')
firstFooter = c('BOTTOM', '', 'PAGE')

The zero-length string in the firstHeader and firstFooter creates this error:

Error in names(z) <- as.character(nam) : 
  'names' attribute [3] must be the same length as the vector [1]

If I use NA instead of zero-length string,

library('openxlsx')
# Plain text headers and footers
header = c('ODD HEAD LEFT', 'ODD HEAD CENTER', 'ODD HEAD RIGHT')
footer = c('ODD FOOT RIGHT', 'ODD FOOT CENTER', 'ODD FOOT RIGHT')
evenHeader = c('EVEN HEAD LEFT', 'EVEN HEAD CENTER', 'EVEN HEAD RIGHT')
evenFooter = c('EVEN FOOT RIGHT', 'EVEN FOOT CENTER', 'EVEN FOOT RIGHT')
firstHeader = c(NA, 'OF FIRST', NA)
firstFooter = c('BOTTOM', NA, 'PAGE')

I don't get an error, but I get the text "NA" where there shouldn't be anything at all. This is a separate issue #329

I'll see if I can find a way to code around this or identify the issue in #329

JanMarvin commented 2 years ago

hmpf, there's always another case you don't plan for. Thanks for reporting!

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 365 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 1 year ago

This issue was closed because it has been stalled for 7 days with no activity.