tidyverse / glue

Glue strings to data in R. Small, fast, dependency free interpreted string literals.
https://glue.tidyverse.org
Other
723 stars 65 forks source link

glue isn't fast #57

Closed hughjonesd closed 6 years ago

hughjonesd commented 7 years ago
bar <- 'baz'
microbenchmark(glue::glue('foo{bar}'))
# Unit: microseconds
#                   expr     min      lq     mean median       uq     max neval
#  glue::glue("foo{bar}") 291.851 343.165 433.5618 401.78 555.0745 753.867   100
microbenchmark(paste0('foo', bar))
# Unit: microseconds
#              expr  min    lq    mean median     uq    max neval
# paste0("foo", bar) 2.45 2.523 3.05519 2.6085 2.8555 36.524   100
microbenchmark(sprintf('foo%s', bar))
# Unit: nanoseconds
#                   expr min    lq    mean median     uq   max neval
#  sprintf("foo%s", bar) 825 947.5 1273.85 1111.5 1184.5 19195   100
jimhester commented 7 years ago

Speed is of course a relative term as a more inclusive benchmark shows.

bar <- 'baz'
print(unit = "eps", order = "median", signif = 3,
  microbenchmark::microbenchmark(
  fstrings = fstrings::f("foo{bar}"),
  glue = glue::glue("foo{bar}"),
  gstring = R.utils::gstring("foo${bar}"),
  paste0 = paste0("foo", bar),
  pystr_format = pystr::pystr_format("foo{bar}", bar = bar),
  sprintf = sprintf("foo%s", bar),
  str_interp = stringr::str_interp("foo${bar}"),
  rprintf = rprintf::rprintf("foo{bar}", bar = bar)
))
#> Unit: evaluations per second
#>          expr      min     lq       mean median     uq     max neval
#>       gstring 6.51e+00   2480   2530.102   2540   2700    2880   100
#>    str_interp 9.12e+01   3280   3357.290   3390   3540    3900   100
#>       rprintf 3.75e+02   5680   5851.284   5910   6350    7040   100
#>          glue 3.89e+01   5990   6141.215   6220   6450    7250   100
#>      fstrings 2.73e+02  13900  14497.695  14600  15400   18000   100
#>  pystr_format 5.26e+01  19000  19940.074  19900  21400   27400   100
#>        paste0 2.13e+05 403000 460518.077 446000 490000  887000   100
#>       sprintf 2.92e+05 409000 530518.483 466000 552000 1570000   100

While glue() is slower than paste0,sprintf() and pystr_format() it is twice as fast as str_interp() and gstring(), and on par with rprintf(). It should also be noted that the initial commit of glue (then called fstrings() was ~2/3x as fast as it is today (as it did far less work).

paste0(), sprintf() don't do string interpolation and will likely always be significantly faster than glue, glue was never meant to be a direct replacement for them.

pystr_format() and rprintf() both do only variable interpolation, not arbitrary expressions, which was one of the explicit points of writing glue.

So glue is ~2 as fast as the two functions (str_interp, gstring) which do have roughly equivalent functionality.

Also I should note these results are in evaluations per second, so even in this case glue is still returning over 6000 strings per second. Glue is also vectorized, supplying a vector input will provide a considerable speed improvement over explicit looping like this benchmark. If you are using glue within tight loops with enough iterations for the speed differences shown here to matter there is likely larger issues in your code than this package.

hadley commented 7 years ago

We'll rename it to molasses

hughjonesd commented 7 years ago

@jhester These points are fair. Still, a newcomer (e.g. me) might have optimistic expectations that glue would work as fast as base R solutions. Maybe the documentation and web pages could clarify a bit that this isn't going to compete with simple paste0 or sprintf. That would have saved me a transition to glue, and a quick revert when I realised my code was now pretty slow....

@hadley heh.

David

hadley commented 7 years ago

I don't think that benchmark is terribly representative because you shouldn't be making strings inside a loop. When you compare performance on a larger vector, much of the difference goes way.

bar_1 <- rep("bar")
bar_n <- rep("bar", 1e5)

options(digits = 3)
microbenchmark::microbenchmark(
  glue_1 = glue::glue("foo{bar_1}"),
  paste_1 = paste0("foo", bar_1),
  sprintf_1 = sprintf("foo%s", bar_1),
  glue_n = glue::glue("foo{bar_n}"),
  paste_n = paste0("foo", bar_n),
  sprintf_n = sprintf("foo%s", bar_n),
  times = 100, unit = "eps"
)
#> Unit: evaluations per second
#>       expr     min       lq     mean   median       uq      max neval cld
#>     glue_1    37.5   4672.0   6235.3   5367.5   7424.9 1.24e+04   100 a  
#>    paste_1 33421.3  84615.8 271363.3 153076.3 466966.7 8.60e+05   100  b 
#>  sprintf_1 40241.4 164883.8 393788.0 218115.2 426439.2 1.63e+06   100   c
#>     glue_n    16.3     41.8     43.6     44.8     47.2 5.35e+01   100 a  
#>    paste_n    55.9     64.7     67.7     68.6     71.0 7.53e+01   100 a  
#>  sprintf_n    18.4     52.0     54.0     54.5     57.3 6.21e+01   100 a
hadley commented 7 years ago

Or in ms:

Unit: milliseconds
      expr      min       lq     mean   median       uq      max neval  cld
    glue_1 2.17e-01  0.30327  0.36160  0.37702  0.40595   0.6421   100 a   
   paste_1 1.14e-03  0.00227  0.00773  0.00671  0.01157   0.0229   100 a   
 sprintf_1 5.66e-04  0.00206  0.00436  0.00405  0.00591   0.0118   100 a   
    glue_n 1.86e+01 21.20238 22.75813 22.79917 23.94331  27.7099   100    d
   paste_n 1.28e+01 13.82781 14.94934 14.61160 16.13239  17.4216   100  b  
 sprintf_n 1.49e+01 16.52262 18.76669 17.16596 18.07751 151.6517   100   c 

22 ms to generate 100,000 strings doesn't seem bad to me.

hughjonesd commented 7 years ago

Well, yes, if I were a better coder I would've vectorised everything. But, being a lowly mortal… -- Sent from Gmail Mobile

jimhester commented 7 years ago

It would be useful to see the example where the change to glue made a large difference in runtime.

hughjonesd commented 7 years ago

It's a complex example but see https://github.com/hughjonesd/huxtable/commit/36c00a17af279797c9bebf5f60bb1f32657472c9 The code is far from optimal for speed (or anything else!) But with paste0 it is tolerable.

David

On 12 October 2017 at 11:44, Jim Hester notifications@github.com wrote:

It would be useful to see the example where the change to glue made a large difference in runtime.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/tidyverse/glue/issues/57#issuecomment-336091398, or mute the thread https://github.com/notifications/unsubscribe-auth/AFjJ9z7bryZOj9UrsmZwYKalNRdWXvANks5sre1wgaJpZM4P0XWK .

jimhester commented 6 years ago

We have a vignette detailing the speed at various inputs / outputs https://glue.tidyverse.org/articles/speed.html, and the speed is largely determined by the speed of the R parser and paste0(), so I think this can be closed.