Switching to base R sorting provides a performance gain over arrange() and future proofs the code against a regression in dev dplyr that results in a ~30x slow down. This PR also replaces the sorting code in bed_random() withbed_sort() to avoid redundancy
Current master performance (CRAN dplyr)
library(valr)
library(microbenchmark)
n <- 1e6
seed_x <- 1010486
genome <- read_genome(valr_example('hg19.chrom.sizes.gz'))
x <- bed_random(genome, n = n, seed = seed_x)
microbenchmark(bed_sort(x),
bed_random(genome, n = n, seed = seed_x),
times = 2,
unit = 's')
#> Unit: seconds
#> expr min lq mean
#> bed_sort(x) 0.03779551 0.03779551 0.03914011
#> bed_random(genome, n = n, seed = seed_x) 0.68720326 0.68720326 0.69683440
#> median uq max neval cld
#> 0.03914011 0.04048472 0.04048472 2 a
#> 0.69683440 0.70646555 0.70646555 2 b
With these changes that use base r radix sort (ported from data.table since r 3.3.0)
#> Unit: seconds
#> expr min lq mean
#> bed_sort(x) 0.02048263 0.02048263 0.02155842
#> bed_random(genome, n = n, seed = seed_x) 0.15447826 0.15447826 0.16032385
#> median uq max neval cld
#> 0.02155842 0.02263422 0.02263422 2 a
#> 0.16032385 0.16616943 0.16616943 2 b
Switching to base R sorting provides a performance gain over
arrange()
and future proofs the code against a regression in dev dplyr that results in a ~30x slow down. This PR also replaces the sorting code inbed_random()
withbed_sort()
to avoid redundancyCurrent master performance (CRAN dplyr)
With these changes that use base r radix sort (ported from data.table since r 3.3.0)
Created on 2020-03-22 by the reprex package (v0.3.0)