ocbe-uio / BayesMallows

R-package for Bayesian preference learning with the Mallows rank model.
https://ocbe-uio.github.io/BayesMallows/
GNU General Public License v3.0
21 stars 9 forks source link

Wrong Ulam implementation #410

Closed osorensen closed 4 months ago

osorensen commented 5 months ago

Thanks to Marta for spotting this:

library(BayesMallows)
# Right-invariance
n_items <- 6
set.seed(5)
rho1 <- sample(n_items)
rho2 <- sample(n_items)
eta <- sample(n_items)

rho1;rho2
#> [1] 2 3 1 5 4 6
#> [1] 1 5 2 3 6 4
rho1[eta];rho2[eta]
#> [1] 1 3 6 2 4 5
#> [1] 2 5 4 1 6 3

for(metric in c("footrule", "spearman", "kendall", "cayley", "hamming", "ulam")) {
  r1 <- compute_rank_distance(rho1,rho2,metric=metric);
  r2 <- compute_rank_distance(rho1[eta],rho2[eta],metric=metric)
  print(paste(metric, r1, r2))
}
#> [1] "footrule 10 10"
#> [1] "spearman 18 18"
#> [1] "kendall 5 5"
#> [1] "cayley 3 3"
#> [1] "hamming 6 6"
#> [1] "ulam 3 4"

Created on 2024-03-24 with reprex v2.1.0

osorensen commented 5 months ago

It boils down to finding n_items minus the longest common subsequence, which should be straightforward to implement.

https://users.soe.ucsc.edu/~sesh/pubs/ulam-soda17-fixed.pdf

osorensen commented 5 months ago

The correct answer in this case seems to be 3. Here is Diaconis' definition:

image

Here is Ulam distance computed by manual inspection:

library(BayesMallows)
# Right-invariance
n_items <- 6
set.seed(5)
rho1 <- sample(n_items)
rho2 <- sample(n_items)

# Ulam distance "by hand":
rho1[create_ordering(rho2)] # Longest increasing subsequence is 156. Ulam distance is 6-3=3.
#> [1] 2 1 5 6 3 4
rho2[create_ordering(rho1)]
#> [1] 2 1 5 6 3 4

eta <- sample(n_items)

rho1[eta][create_ordering(rho2[eta])] # Longest increasing subsequence is 156, Ulam distance is 6-3=3.
#> [1] 2 1 5 6 3 4
rho2[eta][create_ordering(rho1[eta])] 
#> [1] 2 1 5 6 3 4

Created on 2024-03-25 with reprex v2.1.0

osorensen commented 5 months ago

Marta's original test now succeeds after this fix. The correct Ulam distance is also 3, as can be checked by hand.

library(BayesMallows)
# Right-invariance
n_items <- 6
set.seed(5)
rho1 <- sample(n_items)
rho2 <- sample(n_items)
eta <- sample(n_items)

rho1;rho2
#> [1] 2 3 1 5 4 6
#> [1] 1 5 2 3 6 4
rho1[eta];rho2[eta]
#> [1] 1 3 6 2 4 5
#> [1] 2 5 4 1 6 3

for(metric in c("footrule", "spearman", "kendall", "cayley", "hamming", "ulam")) {
  r1 <- compute_rank_distance(rho1,rho2,metric=metric);
  r2 <- compute_rank_distance(rho1[eta],rho2[eta],metric=metric)
  print(paste(metric, r1, r2))
}
#> [1] "footrule 10 10"
#> [1] "spearman 18 18"
#> [1] "kendall 5 5"
#> [1] "cayley 3 3"
#> [1] "hamming 6 6"
#> [1] "ulam 3 3"

Created on 2024-03-25 with reprex v2.1.0