Open jaroslav-kuchar opened 8 years ago
When plotting the clusters e.g. with k-means algorithm, all the points of one cluster belong to e.g. an ellipsoid (see the link you sent me at section "Plotting cluster package").
Another example is that when we have multidimensional data to plot after clustering we have to make PCA. In order to understand what parameters are missing (if not already) you have to try and plot each model with the parameters we already have extracted. Your source seems good. You could also see: 1.) (factoextra library) http://www.sthda.com/english/wiki/factoextra-r-package-easy-multivariate-data-analyses-and-elegant-visualization 2.) (cluster library) https://rstudio-pubs-static.s3.amazonaws.com/33876_1d7794d9a86647ca90c4f182df93f0e8.html
or other libraries e.g. pvclust, fpc, clustersim,mclust,som etc.
PS: Some other parameters for comparisons are also extracted.
Sorry for my inactivity. Let's start with an example: k-means on the standard iris dataset. I have prepared an initial version of implementation that can be integrated within the cluster.obeu function. "Ellipses" and "Convex hulls" sections prepare datapoints that can be easily used to plot such visualizations.
require(car)
require(jsonlite)
data("iris")
# preprocessing
inputs.data <- iris[,1:4]
inputs.data <- scale(inputs.data)
# clustering and pca
inputs.clustering <- jsonlite::fromJSON(cluster.obeu(inputs.data,"kmeans",3))
inputs.pca <- prcomp(inputs.data, scale. = T)
# ellipses
inputs.ellipses <- lapply(
unique(inputs.clustering$clusters),
function(cl) car::dataEllipse(
inputs.pca$x[which(inputs.clustering$clusters==cl),1],
inputs.pca$x[which(inputs.clustering$clusters==cl),2],
draw=F,
levels=c(0.99),
segments=100)
)
# convex hulls
inputs.convexHulls <- lapply(
unique(inputs.clustering$clusters),
function(clId){
dat <- inputs.pca$x[which(inputs.clustering$clusters==clId),1:2]
pts <- chull(dat)
return(dat[c(pts, pts[1]), 1:2])
}
)
# plot
plot(inputs.pca$x[,1:2])
sapply(seq_along(inputs.ellipses), function(clId) lines(inputs.ellipses[[clId]], col=palette()[clId]))
sapply(seq_along(inputs.convexHulls), function(clId) lines(inputs.convexHulls[[clId]], col=palette()[clId]))
sapply(seq_along(unique(inputs.clustering$clusters)), function(clId) points(inputs.pca$x[which(inputs.clustering$clusters==unique(inputs.clustering$clusters)[clId]),1:2],col=palette()[clId]))
Hello jaroslav, can you adjust and include these pieces of code directly in the R script (https://github.com/okgreece/Cluster.OBeu/blob/master/R/cl.analysis.r) ?
The plotting functions and parameters for ellipses and convex hulls are integrated into the kmeans section of the "cl.analysis" function. Names of parameters should be adjusted to your conventions/preferences.
Hello @jaroslav-kuchar , thanks for your great work. I've added your code chunks where it was necessary. The check test returns the following warnings (concerning your function "plot.clustering.model"):
Can you please make the appropriate changes?
I have made changes and updates of imports for clustering plot functions. It should be fixed now.
Thank you. Could you also update your description in "cl.plot" function, which is needed for the manual.
The description should be completed by now.
Thank you
Hello @jaroslav-kuchar I had a discussion with @larjohn about the returns of the ellipse. There are different ways of drawing an ellipse (https://en.wikipedia.org/wiki/Ellipse). Currently you return 100 points to draw an ellipse. Can you return these 4 points marked with green?
Thank you
Hello @kleanthisk10 , the current implementation is build on top of the R package car and its dataEllipse function. This function is designed to return points for the drawing. You can try to change the segments parameter to 4 if it will be able to return exactly the points you would like to have.
The cluster function should return a list of parameters that are important to plot specific models. Could you please specify an example of missing/required parameters? An example covering the whole flow from data to the plot can also help to get it all. Maybe requirements from the visualization point of view can help properly design list of needed attributes. Is the idea somehow similar to "Plotting cluster package" from: https://cran.r-project.org/web/packages/ggfortify/vignettes/plot_pca.html ?