ycphs / openxlsx

openxlsx - a fast way to read and write complex xslx files
https://ycphs.github.io/openxlsx/
Other
226 stars 75 forks source link

Version 4.2.7 detectDates not working #498

Closed tdp-datsci closed 2 months ago

tdp-datsci commented 2 months ago

Reading XLSX file where the dates are date formatted (dd-mmm-yy) hangs. Versions 4.2.6 and 4.2.5.2 read the file correctly and the the dates formatted as dd-mmm-yy. Setting the detectDates flag to FALSE imports the file with dates as numbers as expected. Dates formatted as "custom" also import as numbers.

library(openxlsx) test <- read.xlsx("test.xlsx", detectDates = TRUE)

test.xlsx

JanMarvin commented 2 months ago

Hi @tdp-datsci , could you please provide more details as to what is hanging? And provide the output of Sys.info?

I see the following on a Mac (hid a few columns and rows so that the output doesn't pollute everything). This output looks fine, including the possible typo for MLK Day in 2022.

library(openxlsx)
packageVersion("openxlsx")
#> [1] '4.2.7'

url <- "https://github.com/user-attachments/files/16938544/test.xlsx"

read.xlsx(url, detectDates = FALSE)[1:5, 1:5]
#>                  holidays  2022  2023  2024  2025
#> 1          New Year's Day 44562 44927 45292 45658
#> 2 New Year's Day observed    NA 44928    NA    NA
#> 3                 MLK Day 44578 44577 45306 45677
#> 4         President's Day 44613 44977 45341 45705
#> 5               Arbor Day 44671 45044 45408 45772
read.xlsx(url, detectDates = TRUE)[1:5, 1:5]
#>                  holidays       2022       2023       2024       2025
#> 1          New Year's Day 2022-01-01 2023-01-01 2024-01-01 2025-01-01
#> 2 New Year's Day observed       <NA> 2023-01-02       <NA>       <NA>
#> 3                 MLK Day 2022-01-17 2022-01-16 2024-01-15 2025-01-20
#> 4         President's Day 2022-02-21 2023-02-20 2024-02-19 2025-02-17
#> 5               Arbor Day 2022-04-20 2023-04-28 2024-04-26 2025-04-25

wb <- loadWorkbook(url)
read.xlsx(wb, detectDates = FALSE)[1:5, 1:5]
#>                  holidays  2022  2023  2024  2025
#> 1          New Year's Day 44562 44927 45292 45658
#> 2 New Year's Day observed    NA 44928    NA    NA
#> 3                 MLK Day 44578 44577 45306 45677
#> 4         President's Day 44613 44977 45341 45705
#> 5               Arbor Day 44671 45044 45408 45772
read.xlsx(wb, detectDates = TRUE)[1:5, 1:5]
#>                  holidays       2022       2023       2024       2025
#> 1          New Year's Day 2022-01-01 2023-01-01 2024-01-01 2025-01-01
#> 2 New Year's Day observed       <NA> 2023-01-02       <NA>       <NA>
#> 3                 MLK Day 2022-01-17 2022-01-16 2024-01-15 2025-01-20
#> 4         President's Day 2022-02-21 2023-02-20 2024-02-19 2025-02-17
#> 5               Arbor Day 2022-04-20 2023-04-28 2024-04-26 2025-04-25

[Edit] Hm, now that I think about it, it might be due to some strangeness in std::regex.

JanMarvin commented 2 months ago

Could you please have a look if this pull request (#500) solves your issue, @tdp-datsci ? Maybe this is also related to your issue, @deschen1

remotes::install_github("ycphs/openxlsx#500")

The regex was added to fix #288

tdp-datsci commented 2 months ago

That worked. Thank you for your assistance.

tdp-datsci commented 2 months ago

Screenshot 2024-09-10 061641

JanMarvin commented 2 months ago

Thanks for testing and the report! Are you using Windows?

tdp-datsci commented 2 months ago

Yes. Windows 10.

On Tuesday, September 10th, 2024 at 6:43 AM, Jan Marvin Garbuszus @.***> wrote:

Thanks for testing and the report! Are you using Windows?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you modified the open/close state.Message ID: @.***>

JanMarvin commented 2 months ago

Thanks