tidyverse / lubridate

Make working with dates in R just that little bit easier
https://lubridate.tidyverse.org
GNU General Public License v3.0
723 stars 206 forks source link

Inconsistent Parsing of Midnight-Time Component in lubridate::ymd_hms()? #1124

Open SESjo opened 1 year ago

SESjo commented 1 year ago

Dear maintainers,

I'm not sure it's a proper issue considering it happens when parsing from a Posicx (I know it's not what the function is for), but the way the ymd_hms function handles the 00:00:00 time seems inconsistent (?)

lubridate::ymd_hms(as.POSIXct("2018-11-21"))
#> [1] NA
#> Warning message:
#> All formats failed to parse. No formats found.
lubridate::ymd_hms(as.POSIXct("2018-11-21 00:00:00"))
#> [1] NA
#> Warning message:
#> All formats failed to parse. No formats found.
lubridate::ymd_hms(as.POSIXct("2018-11-21 00:00:01"))
#> [1] "2018-11-21 00:00:01 UTC"
lubridate::ymd_hms("2018-11-21 00:00:01")
#> [1] "2018-11-21 00:00:01 UTC"
lubridate::ymd_hms("2018-11-21 00:00:00")
#> [1] "2018-11-21 UTC"
lubridate::ymd_hms("2018-11-21")
#> [1] NA

The workaround I found is to convert the Posicx into a character, while being explicit about the time:

lubridate::ymd_hms(format(as.POSIXct("2018-11-21 00:00:00"), format = "%Y-%m-%d %T %Z"))
#> [1] "2018-11-21 UTC"

SeesionInfo

R version 4.3.0 (2023-04-21)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.4

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/Los_Angeles
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base    

loaded via a namespace (and not attached):
 [1] tidyr_1.3.0         rgeos_0.6-3         utf8_1.2.3          generics_0.1.3      renv_0.17.3        
 [6] class_7.3-22        xml2_1.3.4          KernSmooth_2.23-21  track2KBA_1.0.5     lattice_0.21-8    
[11] magrittr_2.0.3      grid_4.3.0          iterators_1.0.14    fastmap_1.1.1       maps_3.4.1        
[16] foreach_1.5.2       e1071_1.7-13        DBI_1.1.3           httr_1.4.6          rgdal_1.6-7        
[21] purrr_1.0.1         fansi_1.0.4         scales_1.2.1        move_4.1.12         codetools_0.2-18  
[26] CircStats_0.2-6     ade4_1.7-22         cli_3.6.1           rlang_1.1.1         units_0.8-2        
[31] munsell_0.5.0       yaml_2.3.7          cachem_1.0.8        tools_4.3.0         raster_3.6-20      
[36] parallel_4.3.0      geosphere_1.5-18    memoise_2.0.1       dplyr_1.1.2         colorspace_2.1-0  
[41] ggplot2_3.4.2       boot_1.3-28.1       adehabitatHR_0.4.21 vctrs_0.6.2         R6_2.5.1          
[46] proxy_0.4-27        lifecycle_1.0.3     classInt_0.4-9      adehabitatMA_0.3.16 MASS_7.3-60        
[51] adehabitatLT_0.3.27 pkgconfig_2.0.3     terra_1.7-29        pillar_1.9.0        gtable_0.3.3      
[56] glue_1.6.2          Rcpp_1.0.10         sf_1.0-13           tibble_3.2.1        tidyselect_1.2.0  
[61] rstudioapi_0.14     Matching_4.10-8     compiler_4.3.0      sp_1.6-1   
cjl8zf commented 12 months ago

I have run into this issue too and I came here to ask the same question. Due to the edge case of dropping the HH:MM:SS component for midnight timestamps the ymd_hms function fails to be idempotent, i.e. ymd_hms(ymd_hms(x)) != ymd_hms(x) for all $x$. This would be a nice property to have and holds at all non-midnight times.

However, I have tracked down the source of this issue and it turns out it comes from a call to the base R function .POSIXct here:

https://github.com/tidyverse/lubridate/blob/11d1c9aeefeaa568edcefda09bc477116ffd300e/R/parse.r#L745

For example:

> x <- 1640995200
> base::.POSIXct(x,tz="UTC")
[1] "2022-01-01 UTC"
> base::.POSIXct(x+1,tz="UTC")
[1] "2022-01-01 00:00:01 UTC"

I think this demonstrates that lubridate is being consistent with base R. It is just a property of how the .POSIXct class behaves and thus this issue is upstream of lubridate, although I personally wish it behaved like ymd_hms("2022-01-01 00:00:00 UTC") = "2022-01-01 00:00:00".

milesalanmoore commented 2 months ago

I just wanted to ping this issue to say that this problem persists (perhaps unsurprisingly since it it an issue with base R's treatment of this class). It may be possible to implement a solution similar to what @SESjo provides in the functions that contain time stamps (ymd_hms, ymd_hm, ... ). I am happy to give this a go

lubridate::ymd_hms(format(as.POSIXct(x), format = "%Y-%m-%d %T %Z"))

Thank you all for your hard work