Many things in Excel are stored as serial dates but, due to their format, look like something else to a user, leading the user to believe we can actually import whatever they're eyeballs are seeing in the Excel interface.
Quick #rstats question: I'm importing data with readxl, it contains elapsed time stored in excel as HH:MM:SS, but R displays as YYYY-MM-DD HH:MM:SS (due to excel). How do I tell R these are durations?
It's natural that many suggested workaround route through character, but there are better workarounds.
Here's one suggested by @Hadley for the specific challenge above:
as_duration <- function(x) {
s <- difftime(x, as.POSIXct("1899-12-31"), type = "seconds")
hms::hms(s)
}
It would be helpful to have a vignette explaining this problem to users, i.e. that what they see in Excel is, in general, NOT what's stored on disk and is often NOT literally available to readxl. It would also be a good place to collect recommended workarounds for specific situations.
Along the same line, the excel number format parsing could recognize the "[hh]" or "[mm]" formats which are analogous to difftime. The column data could be formatted as difftime data.
Many things in Excel are stored as serial dates but, due to their format, look like something else to a user, leading the user to believe we can actually import whatever they're eyeballs are seeing in the Excel interface.
Recent example:
https://twitter.com/TomSaundersNZ/status/1336081299354730497
It's natural that many suggested workaround route through character, but there are better workarounds.
Here's one suggested by @Hadley for the specific challenge above:
It would be helpful to have a vignette explaining this problem to users, i.e. that what they see in Excel is, in general, NOT what's stored on disk and is often NOT literally available to readxl. It would also be a good place to collect recommended workarounds for specific situations.
Related to #118 and all of its many friends.