Cluster: specification of parameters to plot cluster models

jaroslav-kuchar commented 8 years ago

The cluster function should return a list of parameters that are important to plot specific models. Could you please specify an example of missing/required parameters? An example covering the whole flow from data to the plot can also help to get it all. Maybe requirements from the visualization point of view can help properly design list of needed attributes. Is the idea somehow similar to "Plotting cluster package" from: https://cran.r-project.org/web/packages/ggfortify/vignettes/plot_pca.html ?

kleanthisk10 commented 8 years ago

When plotting the clusters e.g. with k-means algorithm, all the points of one cluster belong to e.g. an ellipsoid (see the link you sent me at section "Plotting cluster package").

Another example is that when we have multidimensional data to plot after clustering we have to make PCA. In order to understand what parameters are missing (if not already) you have to try and plot each model with the parameters we already have extracted. Your source seems good. You could also see: 1.) (factoextra library) http://www.sthda.com/english/wiki/factoextra-r-package-easy-multivariate-data-analyses-and-elegant-visualization 2.) (cluster library) https://rstudio-pubs-static.s3.amazonaws.com/33876_1d7794d9a86647ca90c4f182df93f0e8.html

or other libraries e.g. pvclust, fpc, clustersim,mclust,som etc.

PS: Some other parameters for comparisons are also extracted.

jaroslav-kuchar commented 8 years ago

Sorry for my inactivity. Let's start with an example: k-means on the standard iris dataset. I have prepared an initial version of implementation that can be integrated within the cluster.obeu function. "Ellipses" and "Convex hulls" sections prepare datapoints that can be easily used to plot such visualizations.

require(car)
require(jsonlite)
data("iris")

# preprocessing
inputs.data <- iris[,1:4]
inputs.data <- scale(inputs.data)

# clustering and pca
inputs.clustering <- jsonlite::fromJSON(cluster.obeu(inputs.data,"kmeans",3))
inputs.pca <- prcomp(inputs.data, scale. = T)

# ellipses
inputs.ellipses <- lapply(
  unique(inputs.clustering$clusters), 
  function(cl) car::dataEllipse(
    inputs.pca$x[which(inputs.clustering$clusters==cl),1],
    inputs.pca$x[which(inputs.clustering$clusters==cl),2], 
    draw=F, 
    levels=c(0.99), 
    segments=100)
)

# convex hulls
inputs.convexHulls <- lapply(
  unique(inputs.clustering$clusters),
  function(clId){
    dat <- inputs.pca$x[which(inputs.clustering$clusters==clId),1:2]
    pts <- chull(dat)
    return(dat[c(pts, pts[1]), 1:2])
  }
)

# plot
plot(inputs.pca$x[,1:2])
sapply(seq_along(inputs.ellipses), function(clId) lines(inputs.ellipses[[clId]], col=palette()[clId]))
sapply(seq_along(inputs.convexHulls), function(clId) lines(inputs.convexHulls[[clId]], col=palette()[clId]))
sapply(seq_along(unique(inputs.clustering$clusters)), function(clId) points(inputs.pca$x[which(inputs.clustering$clusters==unique(inputs.clustering$clusters)[clId]),1:2],col=palette()[clId]))

plot