sevamoo / SOMPY

A Python Library for Self Organizing Map (SOM)
Apache License 2.0
536 stars 242 forks source link

coordinates for the maps #84

Closed hainguyenct closed 5 years ago

hainguyenct commented 6 years ago

Dear All, Anyone who know how to get coordinates of all data points in the maps? Thank you so much for your help. Sincerely yours,

beckrob commented 6 years ago

Hello,

With a trained SOM object named somModel, somModel.codebook.matrix will get you the coordinates for each of the SOM cells, in the order of cell index.

If you use a normalizer (which is the default in SOMFactory().build), these cell coordinates will be normalized coordinates. The original training coordinates can be normalized by somModel._normalizer.normalize(trainingDataMatrix) to get the same reference frame, or if you want to normalize another matrix use somModel._normalizer.normalize_by(trainingDataMatrix, matrixToNormalize)

Alternatively, the cell coordinates can be denormalized to match your original coordinates in this way: somModel._normalizer.denormalize_by(trainingDataMatrix, somModel.codebook.matrix)

As the model stores the original training data, trainingDataMatrix can be replaced by somModel.data_raw to ensure consistency.

I see no way of avoiding this normalizer, as many functions of the SOM class will fail without it. Adding an IdentityNormalizator subclass to https://github.com/sevamoo/SOMPY/blob/master/sompy/normalization.py that does nothing would be the easiest fix.

I hope this helps.

Sincerely, Robert

hainguyenct commented 6 years ago

Dear Robert, Thank you for your help. I would like to show to name of the samples on the map, let say we have 100 samples having the dimension of 50, could you please guide me how to indicate such samples in the map and which neurons/cells in the map corresponding to them? Thank you. Best,

beckrob commented 6 years ago

Hello,

The find_bmu() function of the SOM class will get you the best-matching unit/cell for your 100x50 sampleMatrix, provided you normalize the data properly (see also my previous comment):

normalizedSampleMatrix=somModel._normalizer.normalize_by(somModel.data_raw, sampleMatrix)

sampleLinearCellIndexes=somModel.find_bmu(normalizedSampleMatrix)[0,:]

The [0,:] selects the index. (I haven't checked what the [1,:] component is, probably some kind of metric.) This is a linear index, so e.g. in a 30x40 map it will go from 0-1199. This index corresponds to the row in somModel.codebook.matrix. So the coordinates of the corresponding cells in the 50-dim normalized data space are: sampleClosestCellCoordinates=somModel.codebook.matrix[sampleLinearCellIndexes.astype(int)]

You can also get a 2D index from this if you want, e.g. 0-29, 0-39 for a 30x40 map: sample2DCellIndexes=somModel.bmu_ind_to_xy(sampleLinearCellIndexes)[:,0:2]

Here the [:,0:2] selects the 2D indexes, while the third column, [:,2] is the original 1D index.

From sample2DCellIndexes, you can then, for example, create and plot a 2D histogram to visualize where the samples are in the map.

Sincerely, Robert

businessglitch commented 6 years ago

h = HitMapView(50, 10, 'hitmap', text_size=10, show_text=True) coordinates = h.show(sm),

they are in a form [y,x,nodeid]