tidyverse / readr

Read flat files (csv, tsv, fwf) into R
https://readr.tidyverse.org
Other
1.01k stars 286 forks source link

Writing `POSIXlt` columns loses timezones #1474

Open tohka opened 1 year ago

tohka commented 1 year ago

write_csv removes the time zone information from the POSIXlt values in tibble and appends Z to the output.

> version
               _                                
platform       x86_64-w64-mingw32               
arch           x86_64                           
os             mingw32                          
crt            ucrt                             
system         x86_64, mingw32                  
status                                          
major          4                                
minor          2.2                              
year           2022                             
month          10                               
day            31                               
svn rev        83211                            
language       R                                
version.string R version 4.2.2 (2022-10-31 ucrt)
nickname       Innocent and Trusting            

> library(readr)
> packageVersion("readr")
[1] ‘2.1.4’
> library(tibble)
> packageVersion("tibble")
[1] ‘3.1.8’

> Sys.timezone()
[1] "Asia/Tokyo"
> dt <- "2000/01/01 09:00:00"
> dt.ct <- as.POSIXct(dt, tz=Sys.timezone())
> dt.ct
[1] "2000-01-01 09:00:00 JST"
> dt.lt <- as.POSIXlt(dt, tz=Sys.timezone())
> dt.lt
[1] "2000-01-01 09:00:00 JST"
> df <- data.frame(ct=dt.ct, lt=dt.lt)
> df
                   ct                  lt
1 2000-01-01 09:00:00 2000-01-01 09:00:00
> tbl <- tibble(ct=dt.ct, lt=dt.lt)
> tbl
# A tibble: 1 × 2
  ct                  lt                 
  <dttm>              <dttm>             
1 2000-01-01 09:00:00 2000-01-01 09:00:00

> write_csv(df, "write_csv_df.csv")
> readLines("write_csv_df.csv")
[1] "ct,lt"                                    
[2] "2000-01-01T00:00:00Z,2000-01-01T00:00:00Z"
> write_csv(tbl, "write_csv_tbl.csv")
> readLines("write_csv_tbl.csv")
[1] "ct,lt"                                    
[2] "2000-01-01T00:00:00Z,2000-01-01T09:00:00Z"

"2000-01-01 09:00:00 JST" equals "2000-01-01 00:00:00Z".

However, when using tibble, the POSIXlt value of "2000-01-01 09:00:00 JST" is output as "2000-01-01 09:00:00Z".

hadley commented 1 year ago

Can you please provide a minimal reprex (reproducible example)? The goal of a reprex is to make it as easy as possible for me to recreate your problem so that I can fix it: please help me help you! If you've never heard of a reprex before, start by reading about the reprex package, including the advice further down the page. Please make sure your reprex is created with the reprex package as it gives nicely formatted output and avoids a number of common pitfalls.

tohka commented 1 year ago

Hi @hadley

The reprex code is presented below.

version
#>                _                                
#> platform       x86_64-w64-mingw32               
#> arch           x86_64                           
#> os             mingw32                          
#> crt            ucrt                             
#> system         x86_64, mingw32                  
#> status                                          
#> major          4                                
#> minor          3.1                              
#> year           2023                             
#> month          06                               
#> day            16                               
#> svn rev        84548                            
#> language       R                                
#> version.string R version 4.3.1 (2023-06-16 ucrt)
#> nickname       Beagle Scouts

library(readr)
packageVersion("readr")
#> [1] '2.1.4'
library(tibble)
packageVersion("tibble")
#> [1] '3.2.1'

dt <- "2000/01/01 09:00:00"
tz <- "Asia/Tokyo"

(dt.ct <- as.POSIXct(dt, tz=tz))
#> [1] "2000-01-01 09:00:00 JST"
(dt.lt <- as.POSIXlt(dt, tz=tz))
#> [1] "2000-01-01 09:00:00 JST"

(df <- data.frame(ct=dt.ct, lt=dt.lt))
#>                    ct                  lt
#> 1 2000-01-01 09:00:00 2000-01-01 09:00:00
(tbl <- tibble(ct=dt.ct, lt=dt.lt))
#> # A tibble: 1 × 2
#>   ct                  lt                 
#>   <dttm>              <dttm>             
#> 1 2000-01-01 09:00:00 2000-01-01 09:00:00

write_csv(df, "write_csv_df.csv")
readLines("write_csv_df.csv")
#> [1] "ct,lt"                                    
#> [2] "2000-01-01T00:00:00Z,2000-01-01T00:00:00Z"

write_csv(tbl, "write_csv_tbl.csv")
readLines("write_csv_tbl.csv")
#> [1] "ct,lt"                                    
#> [2] "2000-01-01T00:00:00Z,2000-01-01T09:00:00Z"

Please let me know if there is any other information I am missing.

hadley commented 1 year ago

Here's a somewhat more minimal reprex:

library(readr)
lt <- as.POSIXlt("2000/01/01 09:00:00", tz = "Asia/Tokyo")

df <- data.frame(lt = lt)
str(df)
#> 'data.frame':    1 obs. of  1 variable:
#>  $ lt: POSIXct, format: "2000-01-01 09:00:00"

tbl <- tibble::tibble(lt = lt)
str(tbl)
#> tibble [1 × 1] (S3: tbl_df/tbl/data.frame)
#>  $ lt: POSIXlt[1:1], format: "2000-01-01 09:00:00"
cat(format_csv(tbl))
#> lt
#> 2000-01-01T09:00:00Z

Created on 2023-08-01 with reprex v2.0.2

There are two issues: data.frame() automatically turns POSIXlt to POSIXlt so there's no POSIXlt in the data frame example. So write_csv()/format_csv() always appears to lose timezones of POSIXlt variables.