seracio / kohonen

A basic implementation of a Kohonen map in JavaScript
MIT License
13 stars 5 forks source link

Convergence on single neuron with large vectors #43

Closed cbanbury closed 7 years ago

cbanbury commented 7 years ago

I've been playing with this a bit more and it works well for the canonical example of mapping colours. However, when I feed data with more variables (~40) into the SOM, all of the inputs tend to converge on a single neuron.

You seem to have had this issue before with: #17, I'm wondering if it is again related to normalisation?

Should probably have:

nmondon commented 7 years ago

Hi Carl !

Thanks for the feedback, you're right about these points,

cbanbury commented 7 years ago

I have tried commenting out the normalisation line:

this.data = this.normalize(data, scales);

but still see the same convergence. I'll have a play with different normalisation methods externally.

Regarding the timeout, I think you can set the timeout for one or more tests manually. I've been trying to find some test data, how about using astronomical spectra:

http://cdsarc.u-strasbg.fr/viz-bin/Cat?III/92#sRM2.1

This paper did something similar to classify stellar types using SOM.

nmondon commented 7 years ago

Thanks, let me know if you find something!

I'll have a look, I'm sure that will be an interesting test case :)

nmondon commented 7 years ago

I was quite busy the past week, but I will be more available for this this week !

nmondon commented 7 years ago

waow, 2799 dimensions in the stellar dataset!

cbanbury commented 7 years ago

Ha, yes it might be a bit overkill for a test, in theory it should still work though. Would be nice to see what the limits are for this kind of thing using JavaScript.

nmondon commented 7 years ago

Vectorial operations seem to be the problem (combined with normalized values)... Even with a single iteration, all data are converging to the same neuron because dist method returns a NaN... I'm not sure yet

nmondon commented 7 years ago

I got it, it's a BIG mistake in the eigenvectors generation!! Basically, I generate vectors of dimension N with N the num of my input data, not the num of their dimensions... :ashamed

nmondon commented 7 years ago

It was working because :

Basically, I could have randomly initialized my neurons' vectors, it would have been the same...

The convergence on a single neuron occurs as soon as the dimensions cardinality is bigger than the data input cardinality which make the dist method returns NaN

I'm gonna add a decent test coverage on that!

cbanbury commented 7 years ago

Oops! At least it's a fairly easy fix. 😸

nmondon commented 7 years ago

@cbanbury I've finally added an issue on ml-pca repo: https://github.com/mljs/pca/issues/9 because I was not sure of the behavior of their eigenvectors... but it was actually my mistake,

After having fixed this, I ran the stars example and results are not that bad for a first attempt, I've begun a visualisation in a dedicated repo: https://github.com/seracio/kohonen-stars (beware, the vis is working but SOM calculation is based on a non released yet version of kohonen - https://github.com/seracio/kohonen/tree/45-api-redesign)

cbanbury commented 7 years ago

Awesome stuff! I have a feeling that I've run into a similar issue with the ml-pca package, so perhaps their docs need more clarity.

The visualisation looks great, and nice to have as an example for using the package.

nmondon commented 7 years ago

v0.7.0 is out, it finally only fix this bug, the API redesign will be for v1!