reconhub / linelist

An R package to import, clean, and store case data
https://www.repidemicsconsortium.org/linelist
Other
25 stars 5 forks source link

Speedup guess dates #74

Closed zkamvar closed 5 years ago

zkamvar commented 5 years ago

This refactors and speeds up guess_dates. Now, instead of taking 2.5 seconds for 5K messy date entries, it takes 1.25 seconds. It's still not phenomenally fast, but it's faster and that's what matters.

This also changes the excel argument from a logical to a character, but the default will work the same.

The old version was slow:

remotes::install_github("reconhub/linelist@0e6ac439963714ee0ca17a99237f554f6c183b6d")
library(linelist)
md <- messy_data(10000)$"messy/dates"
system.time(guess_dates(md, err = 1))
#>    user  system elapsed 
#>   3.530   0.076   3.605

By contrast, the new version doubles the speed.

remotes::install_github("reconhub/linelist@00f0e3db9d4264aea4bdccc2bccb85c9b4e72227")
library(linelist)
md <- messy_data(10000)$"messy/dates"
system.time(guess_dates(md, err = 1))
#>    user  system elapsed 
#>   1.714   0.092   1.806

Created on 2019-05-15 by the reprex package (v0.2.1)

It now also has arguments for excel dates that include old windows (which addresses #73).