peterwittek / somoclu

Massively parallel self-organizing maps: accelerate training on multicore CPUs, GPUs, and clusters
https://peterwittek.github.io/somoclu/
MIT License
266 stars 69 forks source link

Update SOM after training in R (question) #55

Open apastore opened 7 years ago

apastore commented 7 years ago

Hi i have a matrix with 14M row and 50 variable. I'm training the SOM on a sample of the initial dataset and I would like to map and plot the test (-training) data in R. I have not figure out any way to do it, but I have seen that this is possible in your python interface.

Also I would like to exclude some variable from the training but project them and plot in the trained map. Is this possible?

I would really appreciate your help!

Thanks!

xgdgsc commented 7 years ago

Similar to https://github.com/peterwittek/somoclu/blob/master/src/R/man/Rsomoclu.kohonen.Rd ?

apastore commented 7 years ago

Thanks! This is what I am doing for the training set. Where I am really struggling is to get the SOM Plot for the test data. I am using the map.kohone but it does not return the codes values so it is not possible to plot.

peterwittek commented 7 years ago

The R version of Somoclu is really just a computational back-end to kohonen, so visualization is left to that library. This is a fundamental difference to the Python version, where we develop our own visualization routines, so we have plenty of space to add convenience functions.

You are right, kohonen does not fill the updated map with much meaningful information:

new_data <- matrix(0, nrow=100, ncol=50)
new_map <- map(sommap, new_data)

It is easy to see that the result is correct, e.g., all new points get the same BMU, but you will not be able to visualize the result so easily.

apastore commented 7 years ago

How was wondering to re-compute the code of each SOM unit after mapping the Test set. I have tried with the simple mean of the element map in the unit, but it look to me there is some sort of smooth.

Would be possible to have a R code to recompute the Som Unit Codes after update/test map?

Thanks!

Sent from my iPhone

On Oct 2, 2016, at 6:17 AM, Peter Wittek notifications@github.com wrote:

The R version of Somoclu is really just a computational back-end to kohonen, so visualization is left to that library. This is a fundamental difference to the Python version, where we develop our own visualization routines, so we have plenty of space to add convenience functions.

You are right, kohonen does not fill the updated map with much meaningful information:

new_data <- matrix(0, nrow=100, ncol=50) new_map <- map(sommap, new_data) It is easy to see that the result is correct, e.g., all new points get the same BMU, but you will not be able to visualize the result so easily.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

peterwittek commented 7 years ago

I am sorry I don't understand the question. If you want to continue training the map with the test examples, just pass the current codebook to Rsomoclu.train with the new data. Adjust the radius and the scaling factors to avoid abrupt changes.