Closed lihenryhfl closed 3 years ago
So, what you're getting there is the properly normalized P matrix. The normalization happens here.
However, the normalization doesn't happen on a per-row basis, but on the entire matrix as a whole. So for instance, using your first snippet of code, running P.sum()
gives 1
, as expected. It's not a simple matter of scaling either. The steps to calculate the P matrix are as follows:
You would observe your expected behaviour when adding new data points into an existing embedding, where we calculate perplexities from new data points to existing data points. In this case, symmetrization is skipped, and all the data points are treated independently. So for instance, continuing your code snippet
tmp = affinities.to_new(x)
tmp.sum(axis=1) # array of ones
As a simple wrapper class, I recommend something like this. This isn't included in openTSNE currently, but I might add it later on.
class ExistingAffinities(openTSNE.affinity.Affinities):
def __init__(self, P, verbose=False):
super().__init__(verbose=verbose)
self.P = self.normalize(P)
def to_new(self, data, return_distances=False):
raise NotImplementedError(
"`ExistingAffinities.to_new` is not implemented!"
)
def normalize(self, P, symmetrize=True, normalization="pair-wise"):
if symmetrize:
P = (P + P.T) / 2
if normalization == "pair-wise":
P /= np.sum(P)
elif normalization == "point-wise":
P = sp.diags(np.asarray(1 / P.sum(axis=1)).ravel()) @ P
return P
Aside: I noticed a potential bug in PerplexityBasedNN and MultiscaleMixture --- in both cases, the set_perplexity(...) / set_perplexities(...) functions compute the P matrix with symmetrize=True, irrespective of how the symmetrize flag was set during initialization.
That's a good catch, it should respect the parameter. Thanks!
Ah! That is interesting! I certainly did not realize that there was a matrix-wise normalization step at the end. This makes sense, as Q is also normalized to sum to 1. It seems like this final matrix-wise normalization is very crucial for good convergence of tSNE. Thank you very much for the clarification :)
I shall close the issue.
I want to use openTSNE with a custom P matrix. The documentation suggests that my custom P should be straightforward to substitute in. However, following the code example here, I noticed that the P matrix returned by
openTSNE.affinity.PerplexityBasedNN
, e.g. theP
defined in the code snippet below:does not have rows sum equal to 1, nor does it have the correct row perplexities. That is,
Therefore, I assume that P is normalized at a later point --- so I also need to replace it with my custom P at this later point. However, I couldn't find the code anywhere in
openTSNE.TSNEEmbedding
that performs the row sum and perplexity normalization.Is the current standard to not normalize P? Or am I missing something here?
Aside: I noticed a potential bug in
PerplexityBasedNN
andMultiscaleMixture
--- in both cases, theset_perplexity(...)
/set_perplexities(...)
functions compute the P matrix withsymmetrize=True
, irrespective of how thesymmetrize
flag was set during initialization.