Closed hughjonesd closed 6 years ago
Speed is of course a relative term as a more inclusive benchmark shows.
bar <- 'baz'
print(unit = "eps", order = "median", signif = 3,
microbenchmark::microbenchmark(
fstrings = fstrings::f("foo{bar}"),
glue = glue::glue("foo{bar}"),
gstring = R.utils::gstring("foo${bar}"),
paste0 = paste0("foo", bar),
pystr_format = pystr::pystr_format("foo{bar}", bar = bar),
sprintf = sprintf("foo%s", bar),
str_interp = stringr::str_interp("foo${bar}"),
rprintf = rprintf::rprintf("foo{bar}", bar = bar)
))
#> Unit: evaluations per second
#> expr min lq mean median uq max neval
#> gstring 6.51e+00 2480 2530.102 2540 2700 2880 100
#> str_interp 9.12e+01 3280 3357.290 3390 3540 3900 100
#> rprintf 3.75e+02 5680 5851.284 5910 6350 7040 100
#> glue 3.89e+01 5990 6141.215 6220 6450 7250 100
#> fstrings 2.73e+02 13900 14497.695 14600 15400 18000 100
#> pystr_format 5.26e+01 19000 19940.074 19900 21400 27400 100
#> paste0 2.13e+05 403000 460518.077 446000 490000 887000 100
#> sprintf 2.92e+05 409000 530518.483 466000 552000 1570000 100
While glue()
is slower than paste0
,sprintf()
and pystr_format()
it is twice as fast as str_interp()
and gstring()
, and on par with rprintf()
. It should also be noted that the initial commit of glue (then called fstrings()
was ~2/3x as fast as it is today (as it did far less work).
paste0()
, sprintf()
don't do string interpolation and will likely always be significantly faster than glue, glue was never meant to be a direct replacement for them.
pystr_format()
and rprintf()
both do only variable interpolation, not arbitrary expressions, which was one of the explicit points of writing glue.
So glue is ~2 as fast as the two functions (str_interp, gstring) which do have roughly equivalent functionality.
Also I should note these results are in evaluations per second, so even in this case glue is still returning over 6000 strings per second. Glue is also vectorized, supplying a vector input will provide a considerable speed improvement over explicit looping like this benchmark. If you are using glue within tight loops with enough iterations for the speed differences shown here to matter there is likely larger issues in your code than this package.
We'll rename it to molasses
@jhester These points are fair. Still, a newcomer (e.g. me) might have optimistic expectations that glue would work as fast as base R solutions. Maybe the documentation and web pages could clarify a bit that this isn't going to compete with simple paste0 or sprintf. That would have saved me a transition to glue, and a quick revert when I realised my code was now pretty slow....
@hadley heh.
David
I don't think that benchmark is terribly representative because you shouldn't be making strings inside a loop. When you compare performance on a larger vector, much of the difference goes way.
bar_1 <- rep("bar")
bar_n <- rep("bar", 1e5)
options(digits = 3)
microbenchmark::microbenchmark(
glue_1 = glue::glue("foo{bar_1}"),
paste_1 = paste0("foo", bar_1),
sprintf_1 = sprintf("foo%s", bar_1),
glue_n = glue::glue("foo{bar_n}"),
paste_n = paste0("foo", bar_n),
sprintf_n = sprintf("foo%s", bar_n),
times = 100, unit = "eps"
)
#> Unit: evaluations per second
#> expr min lq mean median uq max neval cld
#> glue_1 37.5 4672.0 6235.3 5367.5 7424.9 1.24e+04 100 a
#> paste_1 33421.3 84615.8 271363.3 153076.3 466966.7 8.60e+05 100 b
#> sprintf_1 40241.4 164883.8 393788.0 218115.2 426439.2 1.63e+06 100 c
#> glue_n 16.3 41.8 43.6 44.8 47.2 5.35e+01 100 a
#> paste_n 55.9 64.7 67.7 68.6 71.0 7.53e+01 100 a
#> sprintf_n 18.4 52.0 54.0 54.5 57.3 6.21e+01 100 a
Or in ms
:
Unit: milliseconds
expr min lq mean median uq max neval cld
glue_1 2.17e-01 0.30327 0.36160 0.37702 0.40595 0.6421 100 a
paste_1 1.14e-03 0.00227 0.00773 0.00671 0.01157 0.0229 100 a
sprintf_1 5.66e-04 0.00206 0.00436 0.00405 0.00591 0.0118 100 a
glue_n 1.86e+01 21.20238 22.75813 22.79917 23.94331 27.7099 100 d
paste_n 1.28e+01 13.82781 14.94934 14.61160 16.13239 17.4216 100 b
sprintf_n 1.49e+01 16.52262 18.76669 17.16596 18.07751 151.6517 100 c
22 ms to generate 100,000 strings doesn't seem bad to me.
Well, yes, if I were a better coder I would've vectorised everything. But, being a lowly mortal… -- Sent from Gmail Mobile
It would be useful to see the example where the change to glue made a large difference in runtime.
It's a complex example but see https://github.com/hughjonesd/huxtable/commit/36c00a17af279797c9bebf5f60bb1f32657472c9 The code is far from optimal for speed (or anything else!) But with paste0 it is tolerable.
David
On 12 October 2017 at 11:44, Jim Hester notifications@github.com wrote:
It would be useful to see the example where the change to glue made a large difference in runtime.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/tidyverse/glue/issues/57#issuecomment-336091398, or mute the thread https://github.com/notifications/unsubscribe-auth/AFjJ9z7bryZOj9UrsmZwYKalNRdWXvANks5sre1wgaJpZM4P0XWK .
We have a vignette detailing the speed at various inputs / outputs https://glue.tidyverse.org/articles/speed.html, and the speed is largely determined by the speed of the R parser and paste0()
, so I think this can be closed.