sevamoo / SOMPY

A Python Library for Self Organizing Map (SOM)
Apache License 2.0
535 stars 242 forks source link

Map cells with 0 hits - reshape error #94

Open sho-87 opened 5 years ago

sho-87 commented 5 years ago

I'm following the AirFlights example using my own dataset but run into a problem when I try to plot the real values component plane

I get a reshape error when I reach the following code:

plot_hex_map(empirical_codebook.reshape(sm.codebook.mapsize + [empirical_codebook.shape[-1]]), 
             titles=df.columns[:-1], shape=[4, 5], colormap=None)

ValueError: cannot reshape array of size 476 into shape (10,10,7)

I checked the values in df['bmus']. The max bmu value is 99, but there are only 68 unique bmu values. So this means I have too little data for my mapsize because some map cells are not being chosen as the bmu, and those cells are therefore not represented in df['bmus']. This is confirmed by plotting the hitsmap, which shows a number of cells with 0 hits

I can fix this by reducing my map size, but it only works if I go down to 3x3, which is pretty pointless

Is there any other way to get around this problem? View2D allows me to plot the prototype planes with no problem using the same model, despite some cells not being chosen as the best unit. Any way to do this with the real value planes as well? Maybe fill in the missing bmu values somehow?

sevamoo commented 5 years ago

Your question is not really clear to me. Al the nodes have their own weight vectors no matter if they are bmus or not. Therefore, we have component planes visualization.

If you want to visualize the values of the data points, you need to write your own plot. Just convert the bmus to xy values and then use some sort of scatter plots for the training data.

When there are nodes between bmu nodes, this means there is a clear cluster border and those nodes are in those borders. Therefore, reducing the som size to 3x3 is not a good idea.

germayneng commented 5 years ago

i have the same error for the real value component heatmaps in the example notebook. Using my own data, I am not able to reshape. Our error comes from this example here:

image

it will be good if someone can unblock us so we can use plot the real component heatmap as well

sho-87 commented 5 years ago

@sevamoo taking a step back...generally the problem is that the reshaping you used in the notebook (as posted by @germayneng) doesn't work on all data, even though from the looks of it it should be a technique that can be generalized to any data

what do you think might be happening thats causing the reshaping errors, if it isnt the unique bmu count problem I described in the original question?

ricardomourarpm commented 5 years ago

To see selected components:

  # If one wants to visualize components maps of some selected variable
  vars = ['Temp','Consumo','WorkingDays']
 Nodes=pd.DataFrame(som_chosen._normalizer.denormalize_by(
          som_chosen.data_raw,som_chosen.codebook.matrix), columns=Labels)
 Nodes_with_selected_variables = Nodes[vars]

   import matplotlib
    matplotlib.rcParams.update({'font.size': 8})

     from sompy.visualization.plot_tools import plot_hex_map
     plot_hex_map(np.flip(Nodes_with_selected_variables.values.reshape(som_chosen.codebook.mapsize +
                                                          [Nodes_with_selected_variables.values.shape[-1]]),axis=0),
         titles=Nodes_with_selected_variables.columns, shape=[1, 3], colormap=None)

Since in my data I have BMU with 0 hits

image

I had to make a different code

     # Recurring to the data normalized and to the location of the nodes, normalized we may find the nearest neighbours of
     # each node by making the data normalized as the training data in order to empirically create the values for the exogenous
      # variables
   Nodes_normalized = pd.DataFrame(som_chosen.codebook.matrix, columns=Labels) # The location of the nodes normalized
    Data_normalized = pd.DataFrame(som_chosen._data,columns=Labels) # The data normalized for nearest neighbours

    from sklearn.neighbors import NearestNeighbors

     Knearmodel = NearestNeighbors(n_neighbors=5) # 5 nearest neighbours with minkowsky power 2
     Knearmodel.fit(Data_normalized.values)

     TotalMinutesDay_regression = []

  for i in range(len(Nodes)): # impute values for empirical TotalMinutesDay
      distances, indices = Knearmodel.kneighbors([Nodes_normalized.loc[i].values])
      closest_vectors_to_node = Cargadata_relevant.loc[indices[0]]
      regression = np.mean(closest_vectors_to_node.TotalMinutesDay)
      TotalMinutesDay_regression = np.append(TotalMinutesDay_regression,regression)

  Nodes_with_selected_variables["TotalMinutesDay"] = TotalMinutesDay_regression

   # Plot hex_map with all variables needed

 plot_hex_map(np.flip(Nodes_with_selected_variables.values.reshape(som_chosen.codebook.mapsize +
                                                          [Nodes_with_selected_variables.values.shape[-1]]),axis=0),
         titles=Nodes_with_selected_variables.columns, shape=[1, 4], colormap=None)

image

With k nearest neighbours I can attribute a coordinate of the variable not used to create the som.

ricardomourarpm commented 3 years ago

NameError: name 'som_chosen' is not defined

Dear akol67,

The code posted is only to be analysed. Out of context doesn't work. som_chosen is the som I've trained for my example.

ricardomourarpm commented 3 years ago

NameError: name 'som_chosen' is not defined

In my context, the variables Temp, Consumo, WorkingDays and TotalMinutesDay are exogenous and the only way I think is possible to construct the exogenous components map is by applying the mean of the nearest neighbours of a given neuron. Even if you make value=1 where BMU hit is zero, when constructing the exogenous components map, you would still have the problem that some neurons(nodes) would not be BMU of any data, so you cannot use sompy codes which retrieve the values of the exogenous variables from the data related to that neuron.

akol67 commented 3 years ago

Dear Ricardo,

It worked!


vars = ['Temp','Consumo','WorkingDays'] Nodes=pd.DataFrame(som_chosen._normalizer.denormalize_by( som_chosen.data_raw,som_chosen.codebook.matrix), columns=Labels) Nodes_with_selected_variables = Nodes[vars]

Em seg., 17 de mai. de 2021 às 12:09, ricardomourarpm < @.***> escreveu:

NameError: name 'som_chosen' is not defined

Dear akol67,

The code posted is only to be analysed. Out of context doesn't work. som_chosen is the som I've trained for my example.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/sevamoo/SOMPY/issues/94#issuecomment-842403683, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI3MX6QNIBFN3TRXIQ4LTV3TOEWUPANCNFSM4GF4A3CA .

-- Att Alexandre Kolisnyk