ycphs / openxlsx

openxlsx - a fast way to read and write complex xslx files
https://ycphs.github.io/openxlsx/
Other
220 stars 74 forks source link

NAs get added to xlsx file from just opening and closing #72

Closed ajdamico closed 3 weeks ago

ajdamico commented 4 years ago

Describe the bug

hi, thanks for the great software. any idea why these NAs get added to the attached test2 using 4_1_5_1.xlsx file? if the behavior is expected, a workaround would be great.. thank you!!

To Reproduce

# behaves as expected with 4.0.17
# remotes::install_version("openxlsx", version = "4.0.17", repos = "http://cran.us.r-project.org") 

# adds NA cells
# remotes::install_github("ycphs/openxlsx")

library(openxlsx)
fn <- "C:/Users/AnthonyD/Desktop/test.xlsx"
fn2 <- "C:/Users/AnthonyD/Desktop/test2.xlsx"
this_wb <- loadWorkbook( fn )
saveWorkbook( this_wb , fn2 )

Expected behavior loading & immediately saving an xlsx file results in zero changes to xlsx file

test.xlsx is the input old version of openxlsx gives the expected result (4_0_17) new version of openxlsx changes the file (4_1_5_1)

test.xlsx test using 4_1_5_1.xlsx test using 4_0_17.xlsx

session info

R version 4.0.0 (2020-04-24)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)
JanMarvin commented 3 years ago

my branch contains a possible fix for this issue. The NA is indeed added by openxlsx upon reading. Though since it is added intentionally, this is a possible wontfix. Also the internal handling of the NA strings might call for a general overhaul. I'm thinking along the lines of na.string = "#N/A" will replace this with NA and replace.na = FALSE would disable adding NA entirely.

JMPivette commented 3 years ago

I had exactly the same kind of issue. But it happens only on some columns but not on others. I am trying to figure out why some blank cells are converted to "NA" and other not but so far I can't find a pattern.. Maybe it's linked to the use of sharedStrings.

@JanMarvin Do you want me to do some testing on the modification in your branch? Also it looks like the original code had been written 4 years ago to fix this issue: https://github.com/awalker89/openxlsx/issues/296

JanMarvin commented 3 years ago

Hi, I never thought of creating a pull request, since it looks to me as if it's intentional behavior. If you want to rebase it on master and create a pull request please go ahead. I'm currently busy with other things. Thanks for finding the original issue, I'll have a look!

GordonAn commented 3 years ago

I recently encountered thee same issue. I noticed that all cell that is a results of a formula that returns an empty string "" is turned into NA, by simply loadWorkbook() and directly saveWorkbook(). I tried JanMarvin's suggested fixing. It works for me. However, https://github.com/JanMarvin/openxlsx/commit/c47948fdbfe04753180a2c044dbb7a32bc5eb86b#comments

Instead of the changes as suggested by JanMarvin, just comment out line 248 in R/loadWorkbook.R, fix this bug, without needing touching other places. line248

line 248

vals[vals == ""] <- "NA"

I hope someone who has enough knowledge about this package can put this fix into the current development version for test. my working around is download the source package and modifying this file. and use install.packages(pkgs="openxlsx", repos = NULL, type="source") to install the fix on my PC.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 365 days with no activity. Remove stale label or comment or this will be closed in 7 days.

ajdamico commented 1 year ago

i don't think i have the permissions to remove the Stale label of this issue, but it does look like it's fixed in the development branch! thanks @deschen1 !!

# remotes::install_github("ycphs/openxlsx",ref='development')
library(openxlsx)
fn <- "C:/Users/AnthonyD/Desktop/test.xlsx"
fn2 <- "C:/Users/AnthonyD/Desktop/test2.xlsx"
this_wb <- loadWorkbook( fn , na.convert = FALSE )
saveWorkbook( this_wb , fn2 )
JanMarvin commented 1 year ago

Just for reference, this is one of the issues solved in openxlsx2.

library(openxlsx2)
wb <- wb_load("https://github.com/ycphs/openxlsx/files/4603444/test.xlsx")
if(interactive()) wb$open()
ajdamico commented 1 year ago

thanks @JanMarvin !! i'm reading the use of upgrade in your docs to imply that openxlsx isn't really going to be maintained going forward. is that true? just curious if i should devote the time to making the switch, or if openxlsx will continue to progress? appreciate it :-)

JanMarvin commented 1 year ago

I'm not implying anything and luckily the future is unwritten. openxlsx is not going away and if you're uncomfortable or do not have the time/need to switch then don't. I was just providing a possible solution for a stale issue.

Catherine-Celice commented 7 months ago

Just to comment so people know this issue is not really stale. I am encountering the same problem where empty cells are being converted to NA simply by loading and saving the workbook. I need to have them remain empty. I can force them to be empty if I read the data into a data frame, then rewrite it to the workbook with writeData(), but I am trying to avoid that since some of my files only need 1 of several tabs updated by the script.