tidyfun / tf

S3 classes and methods for tidy functional data
https://tidyfun.github.io/tf/
GNU Affero General Public License v3.0
7 stars 2 forks source link

perf: set use.names to FALSE in unlist or purrr::list_c #109

Closed m-muecke closed 2 months ago

m-muecke commented 3 months ago

Using unlist(x, use.names = FALSE) is quite a bit faster than unlist(x) depending one the use-case purrr:list_c() is also faster. See benchmarks, also note using recursive = FALSE probably negligible:

Result is basically that unlist(x, use.names = FALSE) is always the fastest then purrr::list_c(). @fabian-s do you have a preference, you were already using purrr::list_c(x) in few places.

set.seed(1312)
bench::press(
  n = c(3, 10, 100, 1000),
  {
    x <- tf::tf_rgp(n) |> tf::tf_sparsify()
    arg <- tf::tf_arg(x)
    bench::mark(
      unlist(arg),
      unlist(arg, use.names = FALSE),
      unlist(arg, recursive = FALSE, use.names = FALSE),
      purrr::list_c(arg),
      check = FALSE
    )
  }
)
#> Running with:
#>       n
#> 1     3
#> 2    10
#> 3   100
#> 4  1000
#> # A tibble: 16 × 7
#>    expression                   n      min   median `itr/sec` mem_alloc `gc/sec`
#>    <bch:expr>               <dbl> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#>  1 unlist(arg)                  3    5.9µs   6.11µs   155388.    1.25KB     0   
#>  2 unlist(arg, use.names =…     3 410.01ns 491.97ns  1653868.      640B     0   
#>  3 unlist(arg, recursive =…     3 410.01ns 492.09ns  1615105.      640B     0   
#>  4 purrr::list_c(arg)           3    4.8µs   5.29µs   181008.   12.42KB    36.2 
#>  5 unlist(arg)                 10  21.11µs  21.73µs    44080.    4.45KB     4.41
#>  6 unlist(arg, use.names =…    10    656ns 738.07ns  1118966.    2.23KB     0   
#>  7 unlist(arg, recursive =…    10    656ns 738.07ns  1159926.    2.23KB   116.  
#>  8 purrr::list_c(arg)          10   5.99µs   6.64µs   140872.    2.48KB    14.1 
#>  9 unlist(arg)                100 191.02µs 195.78µs     4982.   40.03KB     4.15
#> 10 unlist(arg, use.names =…   100   3.65µs   4.06µs   226843.   20.02KB   136.  
#> 11 unlist(arg, recursive =…   100   3.65µs   3.98µs   233055.   20.02KB   117.  
#> 12 purrr::list_c(arg)         100  21.24µs   22.8µs    42233.   20.71KB    25.4 
#> 13 unlist(arg)               1000   1.99ms   2.06ms      474.  401.72KB     4.09
#> 14 unlist(arg, use.names =…  1000  34.69µs  43.75µs    22199.  200.86KB   123.  
#> 15 unlist(arg, recursive =…  1000   34.6µs  42.07µs    23081.  200.86KB   121.  
#> 16 purrr::list_c(arg)        1000 167.32µs 181.84µs     5439.  205.07KB    34.3

Created on 2024-08-01 with reprex v2.1.1

fabian-s commented 3 months ago

thanks!

m-muecke commented 3 months ago
  • lets's use unlist instead of purrr::list_c in general. for tf, sticking to base-R as much as possible makes sense, I think, even more so if the performance is better.
  • not sure a general answer whether to keep names or not is possible, depends on the context. sometimes we need to keep the names I think. obviously if the names are not needed, set use.names=FALSE. was not aware that this makes such a huge difference in memory & time. given the sloppy way we're handling names currently, there are probably some more optimizations like this possible...

thanks!

then I will open a PR adding use.names to FALSE where applicable and replace purrr::list_c() by unlist(x, use.names = FALSE)