Closed osorensen closed 4 months ago
Update, the original implementation was indeed correct. We must just be careful to keep in mind that the distance is defined for orderings, not rankings.
Here for the current CRAN version:
library(BayesMallows)
packageVersion("BayesMallows")
#> [1] '2.2.1'
sigma <- c(5, 1, 4, 3, 2)
pi <- c(3, 1, 2, 4, 5)
ranking1 <- create_ranking(sigma)
ranking2 <- create_ranking(pi)
compute_rank_distance(ranking1, ranking2, "ulam")
#> [1] 3
Created on 2024-07-02 with reprex v2.1.0
library(BayesMallows)
packageVersion("BayesMallows")
#> [1] '2.2.1'
sigma <- c(2, 1, 3, 6, 4, 5, 7)
pi <- c(1, 2, 3, 4, 5, 6, 7)
ranking1 <- create_ranking(sigma)
ranking2 <- create_ranking(pi)
compute_rank_distance(ranking1, ranking2, "ulam")
#> [1] 2
Created on 2024-07-02 with reprex v2.1.0
Thanks to Xavier Benavides for pointing out. Here is some reasoning related to the correction, which is in #418.
Definition of Ulam distance
For two rankings
r1
andr2
with lengthn
, the Ulam distance distance equalsn
minus the longest common subsequence of the orderings corresponding to the rankings.Details are in the following two screenshots from Irurozki, Calvo, and Lozano (2016).
First, note that sigma is an ordering.
Next follows the definition of Ulam distance in terms of sigma. Note that there is an error in the example, since the longest increasing subsequence in
sigma
is13457
, of size 5, so the Ulam distance i 7-5=2.Testing the implementation
The following two orderings were used when discovering the bug:
sigma = [51432]
andpi = [31245]
. Their longest common subsequence is 2, which is either32
or12
or14
. Hence, the Ulam distance should be 5 - 2 = 3.In the previous implementation we would get 2, but now we correctly get 3.
Created on 2024-07-02 with reprex v2.1.0
Corrected version of the example from Irurozki et al. (2016):
Created on 2024-07-02 with reprex v2.1.0