What's an optimal path?

As you say on Twitter, there might be different ways of thinking about an optimal path between points.

One low-hanging fruit that would be useful for me would be to, optionally, have the point lie along the exterior of a hypersphere instead of proceeding through the middle--when working with word vectors, I usually transform each point to unit length. Another way to put this request is: can cosine distance be a metric as well as L2 distance?

It might also be useful to partition the path so that intermediate points are close to each other. I could imagine, for example, the following algorithm:

Generate n (say, 6) points between p1 and p2.
Find the intermediate points closest to each of those using an ANN algorithm, subject to the constraint that they are between p1 and p2. (If using linear algebra, not ANN, you could simply find the closest points to the line itself without generating intermediate points).
Whatever point of these 6 is closest to the line between p1 and p2 becomes the pivot, p0.
Rerun the algorithm to find the intermediate points between p1 and p0, and the intermediate points between p2 and p0.

I've implemented a version of this using cosine distance here.

A couple notes:

The general strategy is to find the best point closer to either point than they are to each other. Each of the bolded terms is ambiguous--I use a fairly dumb heuristic for best (sum of cosine similarity), and cosine distance for closeness. I think the non-triangle-equality features of cosine distance may allow it to come up with more
This generates an arbitrary-length tree, but it would be algorithmically prunable. Backtracking is possible in the later recursions.
Some form of network pathfinding might work well too, although I suspect it would tend to rely on a few rather uninteresting rote bridges.
Higher dimensional spaces are weird. If you use straight l2 distance on the word vectors, it's almost never the case that there's an intermediate word that satisfies the closeness criterion.
I think the method you're using, though, may be jumping around very radically back and forth in the space. Some kind of subpartitioning is good because it ensures that in a chain A-B-C-D that B and C have some connection to each other, not just to A and D. As dimensional increases, I think this problem may get worse.

cpath = function(matrix, w1, w2, blacklist = c()) {
  p1 = matrix[[w1]]
  p2 = matrix[[w2]]

  base_similarity = cosineSimilarity(p1, p2)
  pairsims = matrix %*% t(rbind(p1, p2))
  # Words closer to *either* candidate than the other word. These might be used eventually; mostly to save on computation in later steps.
  decentsims = pairsims[apply(pairsims, 1, max) >= base_similarity[1],]
  # Words closer to *both* candidates than the other word. The best of these is the intermediary.
  greatsims =  pairsims[apply(pairsims, 1, min) >= base_similarity[1],]
  if (length(nrow(greatsims))==0 || nrow(greatsims) == 0) {
    return(unique(c(w1, w2)))
  }
  pivot = order(-rowSums(greatsims))[1:4]
  pivotwords = rownames(greatsims)[pivot]
  pivotword = pivotwords[!pivotwords %in% c(w1,w2, blacklist)][1]
  # highlight pivots.
  message(w1, "<-->", toupper(pivotword), "<--->", w2)
  if(is.null(pivotword) || is.na(pivotword)) {
    message(w1,w2)
    return(unique(c(w1, w2)))
  }
  pivotpoint = matrix[[pivotword]]

  # Really, this should *different* for the right and left side.
  mat = matrix[rownames(matrix) %in% rownames(decentsims),]

  left = cpath(mat, w1, pivotword, blacklist = c(w1, w2, blacklist))
  right = cpath(mat, pivotword, w2, blacklist = c(w1, w2, blacklist))
  return(unique(c(left, right)))
}

A function that actually draws a path.

w1 = "Seinfeld"
w2 = "Breaking_Bad"
drawpath = function(w1, w2, save=F,nnoise = 1) {
  pathed = cpath(model, w1, w2)

  just_this = model %>% extract_vectors(pathed)

  r = just_this %>% prcomp %>% `$`("rotation") %>% `[`(,c(1,2))
  noisewords = sample(rownames(model), nnoise)

  with_noise = model %>% extract_vectors(c(pathed, noisewords))
  lotsa = with_noise %*% r %>% as.data.frame %>% rownames_to_column("word") %>% 
    mutate(labelled = word %in% pathed) %>%
    mutate(word = ordered(word, levels = c(pathed, noisewords) )) %>% arrange(word)

  g = lotsa %>%
    ggplot() + aes(x=PC1,y = PC2, label = word) + geom_point(data = lotsa %>% filter(!labelled), size = .5, alpha = .33) + geom_path(data = lotsa %>% filter(labelled), alpha = .33) + geom_text(data = lotsa %>% filter(labelled)) + labs(title=paste("From", w1, "to", w2))
  if (save)
   {ggsave(g, width = 10, height = 8, filename = paste0("~/Pictures/", paste("From", w1, "to", w2), ".png"))}
  g
}

mdlincoln / pathway

What's an optimal path? #7