ycphs / openxlsx

openxlsx - a fast way to read and write complex xslx files
https://ycphs.github.io/openxlsx/
Other
221 stars 74 forks source link

Fixes #72 by adding an optional argument na.convert #360

Closed deschen1 closed 2 years ago

deschen1 commented 2 years ago

Re the test - if you are referring to testthat, that's also something I'm not really familiar with. However, the main problem is that it can't be automated through testthat.

It appears that the behaviour of changing blanks to NAs only happens after the Excel file has been manually opened, saved and closed in Excel. As long as I just save and load an Excel file with openxlsx and don't interact with it (in the sense of not saving it in Excel), blanks are not converted. ONce I save in Excel, they are converted. However, I created a small code chunk that at least does the file creation and mentions when you need to save the file manually. Hope this helps.

#Assuming we are in the package programming environment
devtools::load_all()

test_data <- data.frame(x = c("1", "", "b"),
                        y = c(1, 2, 3))

wb <- createWorkbook()
addWorksheet(wb, "load test")
writeData(wb, "load test", test_data)
saveWorkbook(wb, "test.xlsx", overwrite = TRUE)

#
# Now go to the file, open in Excel, save it and close it, then go back to R
# 

wb_new <- loadWorkbook("test.xlsx", na.convert = TRUE)
saveWorkbook(wb_new, "test2.xlsx", overwrite = TRUE) # test2.xlsx now has NAs

wb_new <- loadWorkbook("test.xlsx", na.convert = FALSE)
saveWorkbook(wb_new, "test3.xlsx", overwrite = TRUE) # test3.xlsx doesn't have NAs
deschen1 commented 2 years ago

Re the change you did to the read_workbook.cpp change. I didn't touch this, because:

a) I wanted to see if it already works without the fix (which it does) b) I'm not familiar with C++, so didn't want to mess around there. Especially since I'm not sure if the respective code part should also get an optional argument for changing the behaviour there.

If you say this can just be changed as suggested by you, I'm happy to add it to the pull request. Otherwise I'd feel more comfortable if someone else would do the change.

JanMarvin commented 2 years ago

Re the test - if you are referring to testthat, that's also something I'm not really familiar with. However, the main problem is that it can't be automated through testthat.

It appears that the behaviour of changing blanks to NAs only happens after the Excel file has been manually opened, saved and closed in Excel. As long as I just save and load an Excel file with openxlsx and don't interact with it (in the sense of not saving it in Excel), blanks are not converted. ONce I save in Excel, they are converted. However, I created a small code chunk that at least does the file creation and mentions when you need to save the file manually. Hope this helps.

#Assuming we are in the package programming environment
devtools::load_all()

test_data <- data.frame(x = c("1", "", "b"),
                        y = c(1, 2, 3))

wb <- createWorkbook()
addWorksheet(wb, "load test")
writeData(wb, "load test", test_data)
saveWorkbook(wb, "test.xlsx", overwrite = TRUE)

#
# Now go to the file, open in Excel, save it and close it, then go back to R
# 

wb_new <- loadWorkbook("test.xlsx", na.convert = TRUE)
saveWorkbook(wb_new, "test2.xlsx", overwrite = TRUE) # test2.xlsx now has NAs

wb_new <- loadWorkbook("test.xlsx", na.convert = FALSE)
saveWorkbook(wb_new, "test3.xlsx", overwrite = TRUE) # test3.xlsx doesn't have NAs

Alright. You could simply use the files from #72. Your change should create any measurable effect. Either a certain value is missing or NA. I can add one, but for that you have to wait until next week.

JanMarvin commented 2 years ago

Re the change you did to the read_workbook.cpp change. I didn't touch this, because:

a) I wanted to see if it already works without the fix (which it does) b) I'm not familiar with C++, so didn't want to mess around there. Especially since I'm not sure if the respective code part should also get an optional argument for changing the behaviour there.

If you say this can just be changed as suggested by you, I'm happy to add it to the pull request. Otherwise I'd feel more comfortable if someone else would do the change.

No worries, we've all started with a small change. We can skip the c++ part in this PR, I do not recall what I was looking for at the time. It's been a long time since that initial draft.

deschen1 commented 2 years ago

@JanMarvin Sorry for nudging you, but wanted to check if you had a chance to look into the commit and are fine with updating the package?

JanMarvin commented 2 years ago

High, thanks for the reminder. I'd still like to see a test for this.

deschen1 commented 2 years ago

Fair enough. I just remember from the discussion above that you mentioned you want to add one. Just want to prevent misunderstanding (in case you are waiting for me to add one), because I though you'd add the test. 😬 😅

JanMarvin commented 2 years ago

Ah well, if you have the time it would be a massive help

deschen1 commented 2 years ago

OK, you still see me confused, because my initial statement stays true, i.e. I can't really add a reproducible example in a sense that it can be run automatically through test_that (beyond the fact that I'm really not sure how to set this up).

The problem is that the NAs only appear after manually (!) opening and saving the Excel file. As long as I don't do anything with the Excel file (after creating it with openxlsx) I don't get any NAs added. Once I open and save the Excel and then do loadWorkbook/saveWorkbook, I do get NAs.

That's why I think the only helpful contribution I can make is this one (where a human interaction for opening/saving is required): https://github.com/ycphs/openxlsx/pull/360#issuecomment-1159208885

JanMarvin commented 2 years ago

okay, I'll look at it. we can make tests with some external xlsx file. writing tests is not exactly hard, but still takes time and time is short :)

deschen1 commented 2 years ago

Thanks. Much appreciated!

JanMarvin commented 2 years ago

I've added the test. Most likely the other part of my fix was for the output to R.

JanMarvin commented 2 years ago

Thanks @deschen1 , I've merged this PR with the development branch! It should be in the next release, whenever that will be :)

deschen1 commented 2 years ago

Thanks @JanMarvin . Just curious what's your release policy, e.g. do you have a minimum number of important changes you want to see in a new version or would such a standalone fix also qualify for a new release? You are probably guessing right the intention of this question that I of course would be very happy to see an updated version soon. 😬

JanMarvin commented 2 years ago

There is no release policy. Someone should test the development branch (e.g., #362) and poke @ycphs to craft a release.

deschen1 commented 2 years ago

@JanMarvin I had a look at the development branch the the RMD check returns with an error:

Error: 'openxlsxFontSizeLookupTable' is not an exported object from 'namespace:openxlsx'. So that might be something you want to look at before updating the master.

Also noticing that the abovementioned issue 362 is not mentioned in the news file (which is what I used to check what's new in the dev branch).

JanMarvin commented 2 years ago

Thanks @deschen1 , every bit of testing helps. We all have a lot to do and user feedback is somewhat scarce for pre-release sources.

JanMarvin commented 2 years ago

FYI: fixed the openxlsxFontSizeLookupTable namespace error. In #363 .If CI gives us a greenlight, I'll merge it to development. Would be nice if you could have another look (e.g. test dates) afterwards.

deschen1 commented 2 years ago

Sure, can do once it's merged.

AbhaySThakur commented 2 years ago

Hi guys, any idea when this one gets released, I am facing the exact issue.