Closed ghost closed 8 years ago
Does CPU kernel give same result? Can you give a full reproduce script that runs without any error?
CPU and GPU it seems. I have updated the original post to include a full reproduction code. Be careful with some of the parameters because I did not include code to ensure that there will only be one BMU per example if it so happens that there is more than one unit equivalent to the BMU. Usually happens if the grid is too small or too few epochs are used. But if it does happen there will be a numpy broadcast error so that is irrelevant to this particular issue.
Actually, the code also gives a large difference for rectangular topology -- the issue is not restricted to the hexagonal grid. I found a small bug that could not possibly affect this: very large distances between codebook vectors and data instances yielded false BMUs (commit 56003f39a73af41c6fab72a6bd1d8b39815d7446).
Recall that by default, the codebook is initialized with unit-length random vectors. Because of this, it is recommended to normalize your data instances, unless you supply your own initial codebook. So I inserted two lines before training:
for vector in train_data:
vector /= np.linalg.norm(vector)
As you point out, the BMUs are never guaranteed to be unique. Somoclu will return you the BMU that has the lowest column index. Your manual BMU search return the lowest row index. This could be source of difference. If you plot the absolute value of the difference of hitmaps, you will see that the BMUs are not that far on average.
I don't see how that be a problem here. Because my code will crash if more than one BMU is returned.
The problem is that either I'm doing something wrong or somoclu is computing BMUs in some peculiar manner.
D = -2*np.dot(W, X.T) + (W**2).sum(1)[:, None] + (X**2).sum(1)[:, None].T
This computes the gram matrix and converts it into a distance matrix. The shape of D is basically the data dimension, som width, som height.
BMU = (D==D.min(0)[None,:]).astype("float32").T
This just makes all the values in D which are not minimum on the somX,SomY plane to be 0 and anything that equals to minimum is made 1.
The rest just converts this into a more usable and less wasteful form. But it does not deal with the possibility of having more than one 1 per plane, which will later cause a crash at:
print np.absolute(new_bmus - som.bmus).sum()
If there is indeed more than one 1 per plane.
So you can see there isn't much room for a mistake and I still have no clue why the BMUs it returns differ from somoclu. Because the operations ought to be equivalent.
Actually, I get that broadcast error once in a while with your code:
ValueError: operands could not be broadcast together with shapes (50001,2) (50000,2)
There is nothing esoteric the way BMUs are calculated. The GPU kernel is basically identical to your formalism -- the D matrix is assembled from the same components, and the temporary structures are very similar. The CPU kernel is just a series of boring for loops doing the same thing, but avoiding temporary structures.
I do recall that multiple BMUs are not unusual. I think the 32-bit precision plays a role in it, and since we are talking about floating point numbers, the same two numbers may or may not evaluate as equal. This is something I see a lot in numpy. The other factor could be the smoothness of the map. You train it for many epochs with a Gaussian neighborhood function that does not have a compact support. I get the smoothest maps with this setting. You can try fewer epochs or flipping on the compact support option when you instantiate the Somoclu object.
If the problem persists, I can take a closer look at it in August.
I gave it some more thought. Now I am more inclined to think that it is a bug, in fact, a conceptual mistake. The BMUs are calculated in every training epoch, then the codebook is updated. This means that the final codebook and the last returned BMUs do not match. The BMU calculation has to be run again after the final update.
I actually thought it might be the case and stated as much (back when my code was incomplete prior to the other guy's request). But I was also doubtful that one final epoch at the lowest training level can shift so many BMUs. I guess I will just test it again after the conceptual bug is fixed.
Also, may I know what is the reason for the codebook adjustments being done on the CPU even when the GPU is selected? I would of thought that it would add additional overhead and also be slower. You can construct lattice masks (for updates) that would effectively turn the whole update process into matrix operations?
It can shift so many BMUs if the support of your neighborhood function is not compact.
I opened a new issue on the GPU update. It was implemented two years ago, but it was never merged back.
Commit eb15da970a80c6c513d555d28c0e84e5669ae48a fixes it for the dense CPU kernel in Python. Please take a look while I am looking into the other kernels and interfaces.
Commit 5aacea58b299fc396a5e7879d2f79942ad6aee4b solves the problem for all kernels and interfaces.
It seems to be working with both the CPU and GPU kernels, with Python 2.7 and 3.5.
[Using GPU, hexagonal lattice, toroid config]
I'm not sure what causes it but there appear to be some BMU inconsistencies when comparing the BMUs returned by training and the BMUs you compute yourself using the codebook and the data.
To reproduce: