schochastics / timeless

A general purpose date(time) parser for R
https://schochastics.github.io/timeless/
Other
21 stars 2 forks source link

Comparison to anytime #23

Open eddelbuettel opened 2 months ago

eddelbuettel commented 2 months ago

Thanks for the package, and for putting a comparison section in. A few things struck me as oddball, though:

> chronos( times )
 [1] "2004-03-21 12:45:33 CST" "2004-03-21 18:45:33 CST" NA                       
 [4] "2004-03-21 18:45:33 CST" NA                        "2004-03-21 22:30:35 CST"
 [7] "1970-08-20 22:45:21 CDT" "2004-03-21 22:30:35 CST" "2004-03-21 00:00:00 CST"
[10] "1970-08-20 14:21:41 CDT"

Similarly, when we use this for benchmarks things are an order of magnitude closer:

> microbenchmark::microbenchmark( chronos(times), anytime(times) )
Unit: microseconds
           expr     min      lq    mean  median      uq     max neval cld
 chronos(times) 126.341 132.531 162.407 146.507 169.491 487.361   100  a 
 anytime(times) 292.882 302.040 330.497 310.236 337.036 651.311   100   b
> 

Lastly, anytime also parses subseconds, chronos does not. (Can you make it?) So it's a wee bit apples versus oranges.

But choice is good so thanks for putting this one together.

schochastics commented 2 months ago

Thanks for the comments!

The dplyr::coalesce() call forces the Date output to be shown as POSIXct

I only used the coalesce call for its ability to replace missing values to show what anytime parses and what not. I didn't pay attention to the output format. But I am gonna change that.

The benchmark comparison 'smells wrong'.

I also do not trust the benchmark entirely. The date(time) I use to benchmark (see the data-raw folder, it is not bench_dates) are randomly generated by chatgpt. I could not spot any bias in the dataset so far

The 'one cannot parse all forms of the other' goes both ways

This one is weird. I am sure that those formats were working. In anycase, they should now.

Similarly, when we use this for benchmarks things are an order of magnitude closer:

Need to figure out what happens here. I guess I need better benchmark datasets.

Thanks again for the comments. I think even if timeless might be a bit faster than anytime (for whatever reason) it is more robust abd better suited for "production"