ocbe-uio / BayesMallows

R-package for Bayesian preference learning with the Mallows rank model.
https://ocbe-uio.github.io/BayesMallows/
GNU General Public License v3.0
21 stars 9 forks source link

Make efficient implementation for sequential learning #386

Closed osorensen closed 6 months ago

osorensen commented 6 months ago

The update_mallows functions have a lot of overhead when called repeatedly. For example, consider the following lines which add sushi rankings one at a time.

library(BayesMallows)
library(tidyverse)
n_items <- ncol(sushi_rankings)
mod <- sample_prior(
  n = 1e4, n_items = n_items, priors = set_priors(gamma = 9, lambda = 2))

n_iter <- 20
learning <- tibble(
  iteration = seq_len(n_iter),
  alpha_mean = numeric(n_iter),
  alpha_sd = numeric(n_iter)
)

for(i in 1:n_iter) {
  mod <- update_mallows(
    model = mod,
    new_data = setup_rank_data(rankings = sushi_rankings[i, ])
  )
  learning$alpha_mean[[i]] <- mean(mod$alpha_samples)
  learning$alpha_sd[[i]] <- sd(mod$alpha_samples)
}

Here is the profiling output. 80 ms are spent tidying the output and 160 ms are spent calling run_smc. Those 80 ms are wasted all until the last call, since they are not used when re-fitting the model. In addition, time is probably spent inside run_smc in setting up everything.

image

For these cases, it would probably be useful to add back a C++ function which takes a time series of data and fits the model using SMC. The original implementation by Waldir and Anja did exactly this, but I removed while working on the refactoring. It should be relatively straightforward to add it back in now.