schochastics / timeless

A general purpose date(time) parser for R
https://schochastics.github.io/timeless/
Other
19 stars 2 forks source link

Outputting `POSIXCt` #12

Closed chainsawriot closed 6 months ago

chainsawriot commented 6 months ago

How about introducing fasttime as Suggests (#7) instead, and adding a new option "out_format = 'posixct'"?

It is still not an apple-to-apple comparison (the problem with fasttime #7). But I think it is at least comparing an Apple to a Quince.

require(chronos)
#> Loading required package: chronos
require(fasttime)
#> Loading required package: fasttime

chronos2 <- function(x) {
    fasttime::fastPOSIXct(chronos(x))
}

microbenchmark::microbenchmark(
    chronos2(bench_date),
    anytime::anytime(bench_date)
)
#> Unit: milliseconds
#>                          expr      min        lq     mean    median        uq
#>          chronos2(bench_date)  1.04123  1.068429  1.23200  1.088958  1.139058
#>  anytime::anytime(bench_date) 19.56998 19.626886 20.25774 19.679201 19.765583
#>        max neval
#>   9.330848   100
#>  72.019507   100

Created on 2024-02-28 with reprex v2.1.0

chainsawriot commented 6 months ago

This is like red-apple-to-green-apple comparison

require(chronos)
#> Loading required package: chronos
require(fasttime)
#> Loading required package: fasttime

chronos2 <- function(x) {
    fasttime::fastPOSIXct(chronos(x))
}

chronos3 <- function(x) {
    anytime::anytime(chronos(x))
}

microbenchmark::microbenchmark(
                    chronos2(bench_date),
                    chronos3(bench_date),
                    anytime::anytime(bench_date)
)
#> Unit: microseconds
#>                          expr       min         lq      mean    median
#>          chronos2(bench_date)  1040.021  1095.8235  1225.749  1138.477
#>          chronos3(bench_date)   881.175   912.4345  2288.702   938.518
#>  anytime::anytime(bench_date) 19977.558 20207.3760 20584.265 20428.050
#>         uq        max neval
#>   1186.745   7391.934   100
#>   1005.477 122754.717   100
#>  20748.179  22797.577   100

Created on 2024-02-28 with reprex v2.1.0

chainsawriot commented 6 months ago

GC filtered; and also benchmarking the memory footprint.

require(chronos)
#> Loading required package: chronos
require(fasttime)
#> Loading required package: fasttime

chronos2 <- function(x) {
    fasttime::fastPOSIXct(chronos(x))
}

chronos3 <- function(x) {
    anytime::anytime(chronos(x))
}

bench::mark(
           chronos2(bench_date),
           chronos3(bench_date),
           anytime::anytime(bench_date), check = FALSE
)
#> # A tibble: 3 × 6
#>   expression                        min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                   <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 chronos2(bench_date)           1.01ms   1.05ms     951.   978.87KB     4.17
#> 2 chronos3(bench_date)         832.77µs 853.53µs    1165.     5.59MB     4.09
#> 3 anytime::anytime(bench_date)  19.93ms  19.99ms      49.4   17.99KB     0

Created on 2024-02-28 with reprex v2.1.0

schochastics commented 6 months ago

Either I misunderstand you or you missed the latest updates of chronos since it does return "POSIXct" by default now, same as anytime

library(chronos)

chronos1 <- function(x){
    chronos(x,out_format = "datetime")
}

class(chronos1(bench_date))
#> [1] "POSIXct" "POSIXt"

chronos2 <- function(x) {
    fasttime::fastPOSIXct(chronos(x,out_format = "character"))
}

chronos3 <- function(x) {
    anytime::anytime(chronos(x))
}

chronos4 <- function(x) {
    anytime::anytime(chronos(x,out_format = "character"))
}

chronos5 <- function(x) {
    as.POSIXct(chronos(x, out_format = "datetime"))
}

bench::mark(
    chronos1(bench_date),
    chronos2(bench_date),
    chronos3(bench_date),
    chronos4(bench_date),
    chronos5(bench_date),
    anytime::anytime(bench_date), check = FALSE
)
#> # A tibble: 6 × 6
#>   expression                        min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>                   <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 chronos1(bench_date)           1.72ms   2.08ms     474.     68.9KB     0   
#> 2 chronos2(bench_date)         300.95µs 352.37µs    2655.    19.77KB     2.03
#> 3 chronos3(bench_date)           1.72ms   1.76ms     503.     5.59MB     0   
#> 4 chronos4(bench_date)         576.44µs 620.47µs    1508.    23.32KB     2.02
#> 5 chronos5(bench_date)           1.71ms   1.72ms     531.    50.61KB     2.03
#> 6 anytime::anytime(bench_date)  30.34ms  31.53ms      29.6    3.27KB     0

Created on 2024-02-28 with reprex v2.1.0 chronos1 has same input and output format as anytime::anytime ,so that should be apple-apple. chronos2 is the speedy version with fastPOSIXct, using the character export from chronos. I wanted to use fasttime as suggest but opted against in order to not have the issue with dates out of range (#7). I think assembling that function can be expected from a user.

chronos3 has no overhead compared to chronos1 since anytime is an S3 method that also has a method for POSIXct (https://github.com/eddelbuettel/anytime/blob/master/R/anytime.R#L198-L202). I guess as.POSIXct used on an object of type POSIXct doesnt do anything, further evidenced by chronos5.

chronos4 surprises me a little bit and I need to investigate why anytime goes from homogeneous character vector to POSIXct faster than chronos.

What chronos does faster than anytime is finding the right format of each character datetime vector with heterogeneous formats

chainsawriot commented 6 months ago

@schochastics Sorry that I was not paying attention to the class. I missed the update.

I think the reason why anytime::anytime() with a fixed format is also fast (your chronos4) is because the timestamp format from chronos is actually the "happiest path". It doesn't need to try different formats.

https://github.com/eddelbuettel/anytime/blob/5de050ef183028d477f97ff43ae57af4a18e39b7/src/anytime.cpp#L42

https://github.com/eddelbuettel/anytime/blob/5de050ef183028d477f97ff43ae57af4a18e39b7/src/anytime.cpp#L218-L223

And it's faster than this in R

https://github.com/schochastics/chronos/blob/5d650e01e20ef705420d2974b241b9a1027240cd/R/utils.R#L2