okgreece / Cluster.OBeu

Cluster Analysis
https://okgreece.github.io/Cluster.OBeu/
GNU General Public License v2.0
2 stars 0 forks source link

Cluster: specification of parameters to plot cluster models #2

Open jaroslav-kuchar opened 8 years ago

jaroslav-kuchar commented 8 years ago

The cluster function should return a list of parameters that are important to plot specific models. Could you please specify an example of missing/required parameters? An example covering the whole flow from data to the plot can also help to get it all. Maybe requirements from the visualization point of view can help properly design list of needed attributes. Is the idea somehow similar to "Plotting cluster package" from: https://cran.r-project.org/web/packages/ggfortify/vignettes/plot_pca.html ?

kleanthisk10 commented 8 years ago

When plotting the clusters e.g. with k-means algorithm, all the points of one cluster belong to e.g. an ellipsoid (see the link you sent me at section "Plotting cluster package").

Another example is that when we have multidimensional data to plot after clustering we have to make PCA. In order to understand what parameters are missing (if not already) you have to try and plot each model with the parameters we already have extracted. Your source seems good. You could also see: 1.) (factoextra library) http://www.sthda.com/english/wiki/factoextra-r-package-easy-multivariate-data-analyses-and-elegant-visualization 2.) (cluster library) https://rstudio-pubs-static.s3.amazonaws.com/33876_1d7794d9a86647ca90c4f182df93f0e8.html

or other libraries e.g. pvclust, fpc, clustersim,mclust,som etc.

PS: Some other parameters for comparisons are also extracted.

jaroslav-kuchar commented 7 years ago

Sorry for my inactivity. Let's start with an example: k-means on the standard iris dataset. I have prepared an initial version of implementation that can be integrated within the cluster.obeu function. "Ellipses" and "Convex hulls" sections prepare datapoints that can be easily used to plot such visualizations.

require(car)
require(jsonlite)
data("iris")

# preprocessing
inputs.data <- iris[,1:4]
inputs.data <- scale(inputs.data)

# clustering and pca
inputs.clustering <- jsonlite::fromJSON(cluster.obeu(inputs.data,"kmeans",3))
inputs.pca <- prcomp(inputs.data, scale. = T)

# ellipses
inputs.ellipses <- lapply(
  unique(inputs.clustering$clusters), 
  function(cl) car::dataEllipse(
    inputs.pca$x[which(inputs.clustering$clusters==cl),1],
    inputs.pca$x[which(inputs.clustering$clusters==cl),2], 
    draw=F, 
    levels=c(0.99), 
    segments=100)
)

# convex hulls
inputs.convexHulls <- lapply(
  unique(inputs.clustering$clusters),
  function(clId){
    dat <- inputs.pca$x[which(inputs.clustering$clusters==clId),1:2]
    pts <- chull(dat)
    return(dat[c(pts, pts[1]), 1:2])
  }
)

# plot
plot(inputs.pca$x[,1:2])
sapply(seq_along(inputs.ellipses), function(clId) lines(inputs.ellipses[[clId]], col=palette()[clId]))
sapply(seq_along(inputs.convexHulls), function(clId) lines(inputs.convexHulls[[clId]], col=palette()[clId]))
sapply(seq_along(unique(inputs.clustering$clusters)), function(clId) points(inputs.pca$x[which(inputs.clustering$clusters==unique(inputs.clustering$clusters)[clId]),1:2],col=palette()[clId]))

plot

kleanthisk10 commented 7 years ago

Hello jaroslav, can you adjust and include these pieces of code directly in the R script (https://github.com/okgreece/Cluster.OBeu/blob/master/R/cl.analysis.r) ?

jaroslav-kuchar commented 7 years ago

The plotting functions and parameters for ellipses and convex hulls are integrated into the kmeans section of the "cl.analysis" function. Names of parameters should be adjusted to your conventions/preferences.

kleanthisk10 commented 7 years ago

Hello @jaroslav-kuchar , thanks for your great work. I've added your code chunks where it was necessary. The check test returns the following warnings (concerning your function "plot.clustering.model"): image

Can you please make the appropriate changes?

jaroslav-kuchar commented 7 years ago

I have made changes and updates of imports for clustering plot functions. It should be fixed now.

kleanthisk10 commented 7 years ago

Thank you. Could you also update your description in "cl.plot" function, which is needed for the manual.

jaroslav-kuchar commented 7 years ago

The description should be completed by now.

kleanthisk10 commented 7 years ago

Thank you

kleanthisk10 commented 7 years ago

Hello @jaroslav-kuchar I had a discussion with @larjohn about the returns of the ellipse. There are different ways of drawing an ellipse (https://en.wikipedia.org/wiki/Ellipse). Currently you return 100 points to draw an ellipse. Can you return these 4 points marked with green?

ellipse-def0 svg 1

Thank you

jaroslav-kuchar commented 7 years ago

Hello @kleanthisk10 , the current implementation is build on top of the R package car and its dataEllipse function. This function is designed to return points for the drawing. You can try to change the segments parameter to 4 if it will be able to return exactly the points you would like to have.