rwehrens / kohonen

Supervised and unsupervised self-organising maps
11 stars 9 forks source link

Codes plot #16

Open Jeff87075 opened 2 years ago

Jeff87075 commented 2 years ago

Hi, I want to ask what values does the plot.kohonen() or plot.kohcodes() functions use to create the codes plot? I first thought it would be the codes component in the output list after running the main som function, but I observed negative values in that matrix while the resulting codes plot does not seem to reflect that (all the arcs arcs start from the center with no room to go into the negative direction). So I was wondering is it actually the codes matrix being plotted out or maybe are there some transformations or scaling performed before plotting? Thanks

ugroempi commented 2 years ago

Hi Jeff,

your assessment is correct, the codes matrix is plotted. plot.kohcodes calls the stars function, which scales each variable from min (center) to max (largest possible diameter). You can try this out e.g. with the following code:

set.seed(123)
dat <- rbind(one=sample((-3):3), two=sample((-3):3), three=sample((-3):3))
colnames(dat) <- LETTERS[1:7]
stars(dat, draw.segments = TRUE, key.loc = c(5,2))

You will see that e.g. feature A has the values "one=-1", "two=+3" and "three=-3", which yields a non-existing segment for three, a small one for one and a maximally large one for two. (The key.loc may need changing, depending on your device.)

Best, Ulrike

Jeff87075 commented 2 years ago

I see, I tried playing around with the stars function with this sample df but I'm still a bit confused by how the scaling works

data = data.frame(A = c(1,2,3),
                  B = c(-2,0,2),
                  C = c(5,0,15))
rownames(data) = c("one","two","three")
stars(data, draw.segments = TRUE, key.loc = c(5,2))

So if different groups (the nodes of the SOM grid in terms of the kohonen SOM) have different minimums, and the data is scaled globally (so the -2 in parameter B of sample "one" is the global minimum and yields the non-existent segment), then all the arcs in sample "three" are equally large and doesn't help differentiate between the values of the three parameters (3 vs 2 vs 15)? Or does the plot.kohcodes() function scale each group independently and paste all the plots together into one graph?

Also one more thing is what does the codes matrix calculated by the som() function represent? Is it the mean or median of the parameters in the corresponding nodes of the SOM grid? Thanks.

ugroempi commented 2 years ago

Yes, indeed, the segments are not suitable for comparison across the different features. In many cases, the features are anyway measured on different scales. If they are measured on a common scale that you want to preserve, I suspect that you would have to write your own plotting function; you could e.g. replace plot.kohcodes with a function of your liking (using assignInNamespace). Or perhaps you could include some dummy objects that take the scale extremes; but if you include those already in SOM creation, they might perhaps mess with the topology of the SOM in unexpected ways ....

Regarding the other question, you find detailed information in the JSS paper on the package (https://www.jstatsoft.org/article/view/v021i05).

Best, Ulrike

Jeff87075 commented 2 years ago

I read the JSS paper but because I'm not too familiar with the mathematical/statistical aspect of things can I just confirm that the codes matrix represents the codebook vectors of each node, which is sort of the "position" of each node in the SOM grid expressed in terms of a combination of all the parameters in the higher dimension of the original inputs?