radanalyticsio / silex

something to help you spark
Apache License 2.0
65 stars 13 forks source link

Not able to train SOMs #66

Closed ankitksharma closed 7 years ago

ankitksharma commented 7 years ago

Hi,

I am trying to create a SOM model using the following code:

val model = SOM.train(xdim = 5, ydim = 5, examples = ffRDD, iterations = 10, fdim = 3)

Unfortunately I'm getting the following error. Can you please provide some documentation through which I can understand if I'm doing anything wrong in building the model.

` Exception in thread "main" java.io.NotSerializableException: com.redhat.et.silex.util.SampleSink Serialization stack:

I'm using Spark v2.0.0 & Scala v2.10.4

Thanks for your time.

willb commented 7 years ago

@ankitksharma thanks for the report! It's a long, embarrassing story but it was a quick fix. I'll cut a new release ASAP.

willb commented 7 years ago

@ankitksharma v0.1.1 should solve this issue. Please try it out and let me know if you have any problems!

ankitksharma commented 7 years ago

@willb This is awesome, its working now. Thanks a lot for quickly resolving this. Appreciate the effort.

willb commented 7 years ago

@ankitksharma Thanks again! I'm planning some improvements to this code in the future, so please stay tuned if it turns out to be useful.

ankitksharma commented 7 years ago

@willb That would be great. Although I'm able to run the code but I'm not able to figure out the result yet. I am trying to do clustering using SOM and check the accuracy using a labelled dataset. Similar to example given in R library "Kohonen". If you know R language then following is the code:

data(wines)
set.seed(7)

training <- sample(nrow(wines), 120)
Xtraining <- scale(wines[training, ])
Xtest <- scale(wines[-training, ],
               center = attr(Xtraining, "scaled:center"),
               scale = attr(Xtraining, "scaled:scale"))

som.wines <- som(Xtraining, grid = somgrid(5, 5, "hexagonal"))

som.prediction <- predict(som.wines, newdata = Xtest,
                          trainX = Xtraining,
                          trainY = factor(wine.classes[training]))
table(wine.classes[-training], som.prediction$prediction)
plot(som.wines)

This gives me the following confusion matrix

screenshot 2017-01-11 23 31 41

SOM plot which I get from this

som

Now when I try to run your library on the same dataset in Spark and using the following config:

val model = SOM.train(xdim = 7, ydim = 7, examples = xtrain, iterations = 100, fdim = data.columns.length)

I get only one type of response from model.closestWithSimilarity() function

(48, 1)

I am not able to understand this result. Can you throw some light on this? If you have some sort of documentation which can help me out in getting results close to what R library does it will do the job for me.

Thanks a lot for the help by the way.