ropensci-archive / scrubr

:warning: ARCHIVED :warning: Clean species occurrence records
Other
34 stars 10 forks source link

Speed up dedup() #26

Closed sckott closed 4 years ago

sckott commented 7 years ago

Quite slow right now - profiling with waxwing data - more soon

sckott commented 7 years ago

got some speed ups with a combination of using data.table and fastmatch

microbenchmark(
  `old` = dedup_old(x),
  `new` = dedup(x),
  times = 10
)
#> Unit: milliseconds
#>  expr      min       lq     mean   median       uq      max neval
#>   old 617.9352 674.4674 709.8393 687.4735 766.6044 832.4115    10
#>   new 365.2439 369.0146 395.8602 375.9335 398.7851 524.3056    10

will look for more spots that can be sped up

maelle commented 7 years ago

This is already an impressive speedup, nice job!

For a data.frame as big as the waxwings one, I'm still afraid that my computer might fail (not because of speed but because of memory?) but I'll try it. :smile_cat:

sckott commented 7 years ago

yeah, it still should be much faster, may try to do some c++ soon (unless all the bottleneck parts are already in it)