tidyverse / readxl

Read excel files (.xls and .xlsx) into R 🖇
https://readxl.tidyverse.org
Other
727 stars 195 forks source link

Vignette with background and practical strategies re: datetimes and friends #637

Open jennybc opened 3 years ago

jennybc commented 3 years ago

Many things in Excel are stored as serial dates but, due to their format, look like something else to a user, leading the user to believe we can actually import whatever they're eyeballs are seeing in the Excel interface.

Recent example:

https://twitter.com/TomSaundersNZ/status/1336081299354730497

Quick #rstats question: I'm importing data with readxl, it contains elapsed time stored in excel as HH:MM:SS, but R displays as YYYY-MM-DD HH:MM:SS (due to excel). How do I tell R these are durations?

It's natural that many suggested workaround route through character, but there are better workarounds.

Here's one suggested by @Hadley for the specific challenge above:

as_duration <- function(x) {
  s <- difftime(x, as.POSIXct("1899-12-31"), type = "seconds")
  hms::hms(s)
}

It would be helpful to have a vignette explaining this problem to users, i.e. that what they see in Excel is, in general, NOT what's stored on disk and is often NOT literally available to readxl. It would also be a good place to collect recommended workarounds for specific situations.

Related to #118 and all of its many friends.

mitchbur commented 2 years ago

Along the same line, the excel number format parsing could recognize the "[hh]" or "[mm]" formats which are analogous to difftime. The column data could be formatted as difftime data.