This new version of scrub(), written in base R only is much faster than the previous one. Instead of looking for a long list of special characters, the gsub looks for anything but numbers, letters, '.', ',', ' ', '-'.
I did not profile the parsing of a long string and I don't know if scrub() plays a big part in the time parsing takes but it is still an improvement.
Example
> microbenchmark::microbenchmark(scrub(rep(c(test_lats, test_lons),10)), scrub2(rep(c(test_lats, test_lons),10)), times = 1000)
Unit: microseconds
expr min lq mean median uq max neval
scrub(rep(c(test_lats, test_lons), 10)) 3144.3 3197.60 3399.5188 3320.95 3488.30 4466.2 1000
scrub2(rep(c(test_lats, test_lons), 10)) 397.0 413.55 450.9085 423.65 459.45 922.1 1000
I updated the tests on scrub which might not be the best practice but it makes sense:
In expect_equal(scrub("``º′″"), "'''''") we can expect 5 characters not for and the parsing functions handles it well.
Description
This new version of
scrub()
, written in base R only is much faster than the previous one. Instead of looking for a long list of special characters, the gsub looks for anything but numbers, letters, '.', ',', ' ', '-'.I did not profile the parsing of a long string and I don't know if
scrub()
plays a big part in the time parsing takes but it is still an improvement.Example
I updated the tests on scrub which might not be the best practice but it makes sense: In
expect_equal(scrub("``º′″"), "'''''")
we can expect 5 characters not for and the parsing functions handles it well.