rnabioco / valr

Genome Interval Arithmetic in R
http://rnabioco.github.io/valr/
Other
88 stars 25 forks source link

benchmarks #355

Closed jayhesselberth closed 4 years ago

jayhesselberth commented 4 years ago

I'm not sure what happened, but these benchmarks are significantly slower than previous (most were <2 seconds). Can you confirm @kriemo?

https://valr.hesselberthlab.org/articles/benchmarks.html

kriemo commented 4 years ago

I'm seeing a slowdown as well, but not as slow as your benchmarks. This result was using dev dplyr which has a large performance regression in arrange. See below for benchmarks with CRAN dplyr.

Screen Shot 2020-03-21 at 8.40.18 AM.pdf

kriemo commented 4 years ago

I am not sure what caused this result. On my end the benchmarks looks reasonable with the current master branch with dplyr 0.8.5. Perhaps the servers used by travis had a slowdown? Might be worth rebuilding the docs to see if it is a reproducible regression.

library(valr)
library(dplyr)
library(ggplot2)
library(tibble)
library(scales)
library(GenomicRanges)
library(microbenchmark)

genome <- read_genome(valr_example('hg19.chrom.sizes.gz'))

# number of intervals
n <- 1e6
# number of timing reps
nrep <- 2

seed_x <- 1010486
x <- bed_random(genome, n = n, seed = seed_x)
seed_y <- 9283019
y <- bed_random(genome, n = n, seed = seed_y)

res <- microbenchmark(
  # randomizing functions
  bed_random(genome, n = n, seed = seed_x),
  bed_shuffle(x, genome, seed = seed_x),
  # # single tbl functions
  bed_slop(x, genome, both = 1000),
  bed_flank(x, genome, both = 1000),
  bed_shift(x, genome),
  bed_merge(x),
  bed_partition(x),
  bed_cluster(x),
  bed_complement(x, genome),
  # multi tbl functions
  bed_closest(x, y),
  bed_intersect(x, y),
  bed_map(x, y, .n = n()),
  bed_subtract(x, y),
  bed_window(x, y, genome),
  # stats
  bed_absdist(x, y, genome),
  bed_reldist(x, y),
  bed_jaccard(x, y),
  bed_fisher(x, y, genome),
  bed_projection(x, y, genome),
  # utilities
  bed_makewindows(x, win_size = 100),
  times = nrep,
  unit = 's')

# covert nanoseconds to seconds
res <- res %>%
  as_tibble() %>%
  mutate(time = time / 1e9) %>%
  arrange(time)

# futz with the x-axis
maxs <- res %>%
  group_by(expr) %>%
  summarize(max.time = max(boxplot.stats(time)$stats))

# filter out outliers
res <- res %>%
  left_join(maxs) %>%
  filter(time <= max.time * 1.05)
#> Joining, by = "expr"

ggplot(res, aes(x=reorder(expr, time), y=time)) +
  geom_boxplot(fill = 'red', outlier.shape = NA, alpha = 0.5) +
  coord_flip() +
  theme_bw() +
  labs(
    y='execution time (seconds)',
    x='',
    title="valr benchmarks",
    subtitle=paste(comma(n), "random x/y intervals,", comma(nrep), "repetitions"))

Created on 2020-03-21 by the reprex package (v0.3.0)

kriemo commented 4 years ago

The benchmark vignette now shows normal timings. Perhaps there was some isolated issue during that previous travis pkgdown build.