ycphs / openxlsx

openxlsx - a fast way to read and write complex xslx files
https://ycphs.github.io/openxlsx/
Other
226 stars 75 forks source link

writeData followed by saveWorkbook creates corrupted Excel file #277

Closed RosalynLP closed 3 years ago

RosalynLP commented 3 years ago

When I'm using openxlsx to save a dataframe, the resultant file ends up being damaged.

library(openxlsx)

# Making dummy dataframe
Name <- c("Jon", "Bill", "Maria", "Ben", "Tina")
Age <- c(23, 41, 32, 58, 26)

df <- data.frame(Name, Age)

# Creating output file 
output_path <- "check_corruption.xlsx"
wb <- createWorkbook(output_path) 
addWorksheet(wb, "Sheet1")

# Write unformatted data
writeData(wb, sheet=1, df)

# Save fully formatted output
saveWorkbook(wb, output_path, overwrite=TRUE)

When I try to open check_corruption.xlsx I get the error

"We found a problem with some content in 'check_corruption.xlsx'. Do you want us to try to recover as much as we can?".

If I repair the file then the output seems fine but I don't understand what's causing the problem in the first place.

JanMarvin commented 3 years ago

Hi @RosalynLP , could you please check if the issue remains with the current master branch?

RosalynLP commented 3 years ago

@JanMarvin yes I reinstalled the development version (as specified in the main README), and I am having the same issue.

JanMarvin commented 3 years ago

Hi @RosalynLP , I cannot reproduce the issue with MS 365 (Excel in Webbrowser) nor with the Excel Android App. Which Excel version do you use and can you please double check that your openxlsx package version look the same?

> packageVersion("openxlsx")
[1] ‘4.2.4.9000’
jmbarbone commented 3 years ago

The devtools alternative will show a bit more detail and the commit (if installed from github). 1d2d0b5 is currently the head.

devtools::package_info("openxlsx")
#>  package  * version    date       lib source                         
#>  openxlsx   4.2.4.9000 2021-10-18 [1] Github (ycphs/openxlsx@1d2d0b5)
#>  Rcpp       1.0.7      2021-07-07 [1] CRAN (R 4.1.0)                 
#>  stringi    1.7.5      2021-10-04 [1] CRAN (R 4.1.1)                 
#>  zip        2.2.0      2021-05-31 [1] CRAN (R 4.1.0)                 
#> 
#> [1] C:/Users/jmbar/Documents/R/win-library/4.1
#> [2] C:/Program Files/R/R-4.1.0/library

Created on 2021-10-18 by the reprex package (v2.0.1)

RosalynLP commented 3 years ago

Hi sorry for the late reply, my excel version is 16.0.4266.1001 32-bit. I checked my openxlsx version is also the current one.

I also tried opening in Excel 365 and I see: image

JanMarvin commented 3 years ago

Hi, with current you mean the current head from git master? There are known issues with the 'current' CRAN release, but these should be fixed in the development version and running your example, I wasn't able to reproduce the issue. Could you show your sys session info?

RosalynLP commented 3 years ago

@JanMarvin I mean the current head from git master which I installed following the README instructions in this repo. My session info is

R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Red Hat Enterprise Linux

Matrix products: default
BLAS: /opt/R/3.5.1/lib64/R/lib/libRblas.so
LAPACK: /opt/R/3.5.1/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_GB.UTF-8       
 [4] LC_COLLATE=en_GB.UTF-8     LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] glue_1.4.2          openxlsx_4.2.4.9000

loaded via a namespace (and not attached):
[1] compiler_3.5.1 tools_3.5.1    yaml_2.2.0     Rcpp_1.0.7     stringi_1.7.5  zip_2.2.0 
JanMarvin commented 3 years ago

Thanks. R 3.5.1 is quite outdated, but I know the suffering of those having to use old R installations quite well. Might be something no longer triggered with more recent R installations (4 is oldrel and 4.1 is stable). I'll see if I can have a look, but I'm unlikely to invest much time this. Even though Red Hat might provide support until the end of time, our resources are a bit limited.

RosalynLP commented 3 years ago

OK thank you @JanMarvin let me try updating R first and then get back to you on whether it's still an issue :)

RosalynLP commented 3 years ago

I updated to 3.6.1 and still no luck, but that's as far as I can go within my organisation. Thanks for your help but seeing as you can't reproduce the issue we can probably close it and chalk it up to probably old version of R.

JanMarvin commented 3 years ago

Sure, I'll have a look some rainy day ... 😄

JanMarvin commented 3 years ago

If you can, please upload the broken xlsx file here. It might give us a hint, what we should look for

RosalynLP commented 3 years ago

@JanMarvin here is the broken file, thanks check_corruption.xlsx

JanMarvin commented 3 years ago

Okay, there is an issue and it has nothing to with R. It's simply an encoding issue. You have an & in your path. We copy the path into core.xml and have to encode the & as &amp;. I'll check if I can create a simple pull request.

JanMarvin commented 3 years ago

I've pushed a potential fix to #278 and will merge it with master, once the CI is completed. At least that fixes the core.xml file.

JanMarvin commented 3 years ago

merged now, hopefully it's fixed once you rebuild master

RosalynLP commented 3 years ago

@JanMarvin amazing thanks so much for this! I'll rebuild and check it's fixed :)

RosalynLP commented 3 years ago

I forgot to say, it's working for me perfectly now, thank you :)

JanMarvin commented 3 years ago

Thanks for the heads-up!