Convergence on single neuron with large vectors

cbanbury commented 7 years ago

I've been playing with this a bit more and it works well for the canonical example of mapping colours. However, when I feed data with more variables (~40) into the SOM, all of the inputs tend to converge on a single neuron.

You seem to have had this issue before with: #17, I'm wondering if it is again related to normalisation?

Should probably have:

options for normalisation and re evaluate normalisation strategy
more extensive tests for data with larger vectors

nmondon commented 7 years ago

Hi Carl !

Thanks for the feedback, you're right about these points,

I will make the normalization optional and maybe others algorithmic steps, it will allow us to figure out what are the root of your issue
yes, we should improve the test coverage, one of the issues I encountered then was the limited duration of a test under mocha (2 seconds max if I recall well).

cbanbury commented 7 years ago

I have tried commenting out the normalisation line:

this.data = this.normalize(data, scales);

but still see the same convergence. I'll have a play with different normalisation methods externally.

Regarding the timeout, I think you can set the timeout for one or more tests manually. I've been trying to find some test data, how about using astronomical spectra:

http://cdsarc.u-strasbg.fr/viz-bin/Cat?III/92#sRM2.1

This paper did something similar to classify stellar types using SOM.

nmondon commented 7 years ago

Thanks, let me know if you find something!

I'll have a look, I'm sure that will be an interesting test case :)

nmondon commented 7 years ago

I was quite busy the past week, but I will be more available for this this week !

nmondon commented 7 years ago

waow, 2799 dimensions in the stellar dataset!

cbanbury commented 7 years ago

Ha, yes it might be a bit overkill for a test, in theory it should still work though. Would be nice to see what the limits are for this kind of thing using JavaScript.

nmondon commented 7 years ago

Vectorial operations seem to be the problem (combined with normalized values)... Even with a single iteration, all data are converging to the same neuron because dist method returns a NaN... I'm not sure yet

nmondon commented 7 years ago

I got it, it's a BIG mistake in the eigenvectors generation!! Basically, I generate vectors of dimension N with N the num of my input data, not the num of their dimensions... :ashamed

nmondon commented 7 years ago

It was working because :

of the dist method parameters order
and because vectors of inputs had a lower dimension than the neurons' vectors...

Basically, I could have randomly initialized my neurons' vectors, it would have been the same...

The convergence on a single neuron occurs as soon as the dimensions cardinality is bigger than the data input cardinality which make the dist method returns NaN

I'm gonna add a decent test coverage on that!

cbanbury commented 7 years ago

Oops! At least it's a fairly easy fix. 😸

nmondon commented 7 years ago

@cbanbury I've finally added an issue on ml-pca repo: https://github.com/mljs/pca/issues/9 because I was not sure of the behavior of their eigenvectors... but it was actually my mistake,

After having fixed this, I ran the stars example and results are not that bad for a first attempt, I've begun a visualisation in a dedicated repo: https://github.com/seracio/kohonen-stars (beware, the vis is working but SOM calculation is based on a non released yet version of kohonen - https://github.com/seracio/kohonen/tree/45-api-redesign)

cbanbury commented 7 years ago

Awesome stuff! I have a feeling that I've run into a similar issue with the ml-pca package, so perhaps their docs need more clarity.

The visualisation looks great, and nice to have as an example for using the package.

nmondon commented 7 years ago

v0.7.0 is out, it finally only fix this bug, the API redesign will be for v1!

seracio / kohonen

Convergence on single neuron with large vectors #43