peterwittek / somoclu

Massively parallel self-organizing maps: accelerate training on multicore CPUs, GPUs, and clusters
https://peterwittek.github.io/somoclu/
MIT License
268 stars 70 forks source link

[R] mapping to kohonen package #97

Closed schochastics closed 6 years ago

schochastics commented 6 years ago

I have found a strange behavior when invoking Rsomoclu.kohonen(). It appears that the input data is only mapped to the first 48 SOM nodes. I dug into the code and found that the problem appears in the line mapping <- map(som(result$codebook), newdata = input_data).

kohonen::som calls kohonen::somgrid() and initializes a 8x6 grid, so the input data is just mapped to this grid even if the grid constructed in Rsomoclu.train() is larger. For my purposes I wrote a very crued workaround that appears to work (only for distance function "sumofsquares" and slow for large input data!)

to_kohonen <- function (input_data, result, n.hood = NULL, toroidal = FALSE) 
{
  dists <- apply(input_data,1,function(x) rowSums((t(t(result$codebook)-x))^2))
  min_dists <- apply(dists,2,min)
  classif <- apply(dists,2,which.min)
  # mapping <- map(som(result$codebook), newdata = input_data) <- ERROR appears here
  nSomX = nrow(result$uMatrix)
  nSomY = ncol(result$uMatrix)
  grid = somgrid(nSomX, nSomY)
  if (missing(n.hood)) {
    n.hood <- switch(grid$topo, hexagonal = "circular", 
                     rectangular = "square")
  }
  else {
    n.hood <- match.arg(n.hood, c("circular", "square"))
  }
  grid$n.hood <- n.hood
  sommap = structure(list(data = list(input_data), grid = grid, 
                          codes = list(result$codebook), changes = NULL, unit.classif = classif, 
                          distances = min_dists, toroidal = toroidal, 
                          user.weights = 1, distance.weights = 1, whatmap = 1, 
                          maxNA.fraction = 0L, method = "som", dist.fcts = "sumofsquares"), 
                     class = "kohonen")
  sommap
}
peterwittek commented 6 years ago

@xgdgsc, any thoughts?

xgdgsc commented 6 years ago

Would changing the first few lines to:

  nSomX = nrow(result$uMatrix)
  nSomY = ncol(result$uMatrix)
  grid = somgrid(nSomX, nSomY)
  mapping <- map(som(result$codebook, grid=grid), newdata = input_data)

do?

schochastics commented 6 years ago

Will give it a try and report back. Thx Edit: seems to work