tidyverse / haven

Read SPSS, Stata and SAS files from R
https://haven.tidyverse.org
Other
423 stars 115 forks source link

Add `convert_utc` argument to writer functions #714

Closed gorcha closed 1 year ago

gorcha commented 1 year ago

Close #702.

@jmobrien do you have any feedback on the docs? I'm not entirely sure that this makes sense :sweat_smile:

If TRUE (the default) date times are converted to the equivalent UTC value and timezone is ignored, so they will appear the same in R and Stata/SPSS/SAS. If FALSE, date time variables are written as the corresponding UTC value.

jmobrien commented 1 year ago

@gorcha, I like your most recent version much better, especially as it says more about the underlying issue. Below is a a version I was working on last night, slightly amended based on the work above:

#' If `TRUE` (the default), all \link[base]{date-time} variables to be exported
#' are first adjusted as if the *displayed* time were originally recorded in
#' UTC: for example, "2023-01-01 01:00:00 EST" becomes "2023-01-01 01:00:00 UTC"
#' (analogous to *lubridate*'s \link[lubridate]{force_tz} with `tzone = "UTC"`).
#' This conforms to standard practice in Stata/SPSS/SAS, which lack notions of
#' time zones or daylight savings time. It does, however, change the underlying
#' (numeric) time data; use caution if preserving between-time-point differences
#' is critical, and/or if exported data may later be re-imported into R.
#'
#' If `FALSE`, date-time variable data is exported directly. Times may thus
#' display differently in external software (e.g. "2023-01-01 01:00:00 EST" will
#' appear as "2023-01-01 06:00:00" [UTC], analogous to
#' \link[lubridate]{with_tz} with `tzone = "UTC"`). Date-times re-imported
#' into R will continue to display in UTC ("06:00:00 UTC"), but will remain
#' otherwise consistent across export/import cycles.

Probably not a net improvement vs. your current version, esp. given the length. But posting it b/c I think it does touch on a few important things that will be non-obvious to typical users:

(Had the links to lubridate since its also tidyverse, and I mistakenly remembered that this package made use of their tools. I thought it useful for clarity to lean on the help pages from their different functions they have for clarity, but that should maybe be removed.)

Anyway, feel free to use/adapt any of it (or none of it) as seems useful.

No opinion on the argument name. Seems to me the core ambiguity is that, no matter what, something's getting adjusted/converted/changed/forced--either the underlying time data, or the displayed time. I'm guessing users will likely naturally interpret any argument title from whichever aspect they arrive focused on, which doesn't help. So, perhaps there is no clearly "right" choice?

gorcha commented 1 year ago

Thanks @jmobrien, much appreciated! I've made a few tweaks based on your feedback.