ocbe-uio / BayesMallows

R-package for Bayesian preference learning with the Mallows rank model.
https://ocbe-uio.github.io/BayesMallows/
GNU General Public License v3.0
21 stars 9 forks source link

Merge handling of pairwise preferences and partial rankings #393

Closed osorensen closed 6 months ago

osorensen commented 6 months ago

Currently any_missing = false and augpair = true when the data arrive in terms of preferences. This is unnecessary redundance. These lines should merge into one.

https://github.com/ocbe-uio/BayesMallows/blob/8850531c2bb231bd032b5909d8e64cdd00c9d654/src/run_mcmc.cpp#L64-L65

This step is a prerequisite for #388.

osorensen commented 6 months ago

A relatively simple way of achieving this, which also reduces the complexity of the overall code, is to work directly with derived pairwise preferences whenever there are partial rankings. It is however important to monitor the computational speed of this.

osorensen commented 6 months ago

This captures it. Similar results with derived pairwise preferences as with partial rankings, but 4-5 times slower. May be a matter of looking at the implementation.

library(BayesMallows)
dat <- potato_visual
set.seed(3)
dat[runif(length(dat)) > .9] <- NA

prefs_list <- lapply(seq_len(nrow(dat)), function(user) {
  ranking <- dat[user, ]
  ranking <- ranking[!is.na(ranking)]
  items <- names(ranking)

  n <- length(ranking)
  n_prefs <- choose(n, 2)
  prefs <- data.frame(
    assessor = as.numeric(rep(user, n_prefs)),
    bottom_item = numeric(n_prefs),
    top_item = numeric(n_prefs)
  )
  k <- 1
  for(i in seq_along(ranking)) {
    if(n >= i + 1) {
      idx <- seq(from = i + 1, to = n, by = 1)
    } else {
      idx <- numeric()
    }

    for(j in idx) {
      if(ranking[[i]] < ranking[[j]]) {
        prefs$bottom_item[[k]] <- items[[j]]
        prefs$top_item[[k]] <- items[[i]]
      } else {
        prefs$bottom_item[[k]] <- items[[i]]
        prefs$top_item[[k]] <- items[[j]]
      }
      k <- k + 1
    }
    if(k > n_prefs) break
  }
  prefs
})

prefs <- do.call(rbind, prefs_list)

system.time({
  mod <- compute_mallows(
    data = setup_rank_data(preferences = prefs)
  )
})
#>    user  system elapsed 
#>   0.251   0.009   0.272

assess_convergence(mod)


system.time({
  mod2 <- compute_mallows(
    data = setup_rank_data(rankings = dat)
  )
})
#>    user  system elapsed 
#>   0.050   0.000   0.051

assess_convergence(mod2)

Created on 2024-02-27 with reprex v2.1.0

osorensen commented 6 months ago

Second thoughts. This is not the same. It's only for a full ranking that there is a one-to-one correspondence between the ranking and derived pairwise preferences.

Example, ranking is (1, NA, 3, NA). This means that A1 is fixed to be ranked first and A3 is fixed to be ranked third. A2 and A4 can take the values 2 or 4.

Constructing pairwise preferences from this we get A1<A2, A1<A3, A1<A4, but nothing else. Thus, a ranking like (1, 3, 4, 2) is consistent with the pairwise preferences, although it's not consistent with the partial ranking.

osorensen commented 6 months ago

This is also exactly what Liu et al. say in ARSIA paper:

image