ycphs / openxlsx

openxlsx - a fast way to read and write complex xslx files
https://ycphs.github.io/openxlsx/
Other
220 stars 74 forks source link

openxlsx won't parse Excel files with capital letter *.XLSX suffix in filename #397

Closed asapb closed 8 months ago

asapb commented 1 year ago

Same issue as the readxl R package had back in 2015 ("Won't read non-lowercase .XLSX or .xlsm files #19").

Since Windows is case-insensitive sometimes software will produce Excel files with the .XLSX filename suffix, and the R programmer cannot rename the file to a filename with lower case .xlsx suffix.

The openxlsx R package seems to be case-sensitive, and will identify filenames with the lower case .xlsx suffix, but not with the upper case .XLSX suffix as Excel files.

However, since platforms Excel is typically run on is often case-insensitive (like Windows) it would be helpful if the read.xlsx() function in openxlsx could be case-insensitive too for detecting Excel files.

Hoping that you can update the openxlsx R package to include this feature in the near future. Ta!

JanMarvin commented 1 year ago

Hi @asapb , as a workaround you could use:

library(openxlsx)
fl <- "test.XLSX"
write.xlsx(x = mtcars, file = fl, overwrite = TRUE)
df <- read.xlsx(loadWorkbook(fl))
head(df)
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
#> 6 18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
asapb commented 1 year ago

Hi @JanMarvin and thank you for your quick reply.

OK, this works:

library(openxlsx)
fn = file.path(data_dir, "Data folder", "My_file.XLSX")
df <- read.xlsx(loadWorkbook(fn))
head(df)

Thank you!

JanMarvin commented 1 year ago

Yeah, sorry about the confusion. The write part in my code snippet above simply creates an example file.

asapb commented 1 year ago

Ah yes, the write.xlsx() line in your example did confuse me for a while, but then it all made sense. :) For brevity (and to keep this entry clean) I have deleted my initial follow-up questions and only kept the solution.

Usually I work with Excel files containing multiple worksheets per file, so I would specify the sheet name in question like this:

df <- read.xlsx(loadWorkbook(fn), sheet = "Sheet1")

Thank you very much for your help!

enricoschumann commented 1 year ago

The problem arises because of this test:
https://github.com/ycphs/openxlsx/blob/e7b699eb8acd949616e15557e373e8aea130cb44/R/readWorkbook.R#L123:L125

The test could (and probably should) be made case-insensitive (and a $-sign should be added):

  grepl("\\.xlsx|\\.xlsm$", "nice.xlsxfile.txt")  ## current test
  ## [1] TRUE   ## but should be FALSE

  grepl("\\.xlsx|\\.xlsm$", "xlsxfile.XLSX")  ## current test
  ## [1] FALSE  ## but should be TRUE

  grepl("\\.xlsx$|\\.xlsm$", "nice.xlsxfile.txt", ignore.case = TRUE)
  ## [1] FALSE

  grepl("\\.xlsx$|\\.xlsm$", "xlsxfile.XLSX", ignore.case = TRUE)
  ## [1] TRUE
github-actions[bot] commented 8 months ago

This issue is stale because it has been open 365 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] commented 8 months ago

This issue was closed because it has been stalled for 7 days with no activity.