rwehrens / kohonen

Supervised and unsupervised self-organising maps
11 stars 9 forks source link

Codes Plots with codeRendering="Lines" vs Codes Plot #2

Closed ras44 closed 7 years ago

ras44 commented 7 years ago

Hello! Thanks for the great package.

I'm attaching an example below that illustrates an issue I'm having interpreting the codes plot for the vector components. If I create a graph using codeRendering="lines", it is easy to read the vector components from the lines in the grid-cells. For example, the first vector component output by the head(NBA.SOM1$codes) comand in the example is: V1 -0.69543702 -1.45426574 2.40033941

And in the codeRendering="lines" plot, we can see a line that is negative(-0.69543702), more negative(-1.45426574), and then positive(2.40033941) in the first grid cell. That seems correct.

However, if we use the default codes plot, I'm unsure how to interpret the vector components from the grid. For the first vector, we see a very large third component, but the two other components are very small. This seems to be incorrect. Perhaps I'm misunderstanding how to interpret the codes plot vector components.

NBA <- read.csv(text = getURL("https://raw.githubusercontent.com/clarkdatalabs/soms/master/NBA_2016_player_stats_cleaned.csv"),sep = ",", header = T, check.names = FALSE)

NBA.measures1 <- c("FTA", "2PA", "3PA")

#codes plot
NBA.SOM1 <- som(scale(NBA[NBA.measures1]), grid = somgrid(6, 4, "rectangular"))

#output the first vectors
head(NBA.SOM1$codes)
#create a codeRendering="lines" plot, the vector components can be read from the lines
plot(NBA.SOM1, main = "Vector Components", type="codes", codeRendering="lines")
#create a regular "codes" plot- it is not clear what the colored pie chart components represent
plot(NBA.SOM1, main = "Vector Components", type="codes")

I'm not sure if this is an issue or a misinterpretation, but I wanted to bring it to your attention in case it helps with the package's development.

rwehrens commented 7 years ago

I suppose your problem is in interpreting the stars plot. Basically, the three variables are shown counterclockwise, starting from the first quadrant (so green, beige, light gray), as is indicated in the legend. The surface of the color segment is a measure of the value after range scaling, so the smallest value in the codes column is set to zero (so no surface at all) and the largest to one (maximal radius). For more info and examples, type ?stars.