some questions about code in algorithm 1

neil-yc commented 5 years ago

hi, I have some question about the code in algorithm 1 considering di^2=Lii , the value in diagonal of the kernel L is some thing like CTR ? if so, di^2 should always be less than 1 which is described in stop criteria dj^2 < 1
and di^2 can be negative in my practice and then items less than N(e.g. 20) can be return, so how should i do to address this problem
thanks : )

bduclaux commented 5 years ago

Same issue here : di2 can become negative. Also, the stopping criteria does not make sense: di2<1. If all input vectors are normalized, then di2<1 is always the case. Seems like Algorithm 1 is broken. Could you post your code in the repository, so that we can find the issue ?

laming-chen commented 5 years ago

@Nicholasyc @bduclaux Sorry about the late reply.

On page 4, right below equation (9), we mentioned that

The stopping criteria is di2<1 for unconstraint MAP inference or #Yg>N when the cardinality constraint is imposed. For the latter case, we introduce a small number epsilon>0 and add di2<epsilon to the stopping criteria for numerical stability.

If the input vectors are normalized, then di2<=1 is always the case. But in some cases, for example in equation (11), if we run Algorithm 1 directly on L, then di2 can be larger than 1.

In theory, di2 cannot become negative if L is positive semidefinite. It happens in practice because of numerical instability. That's why we add di2<epsilon to the stopping criteria as mentioned above. Please refer to Supplementary Material section B for more discussions on numerical stability.

bduclaux commented 5 years ago

@laming-chen Thanks a lot ! Very clear !

Supplementary material also states a very important theorem: when the kernel L is a low rank matrix, Algorithm 1 of the paper will return at most rank(L) items. It means that numerical instability will come early in the calculations.

laming-chen commented 5 years ago

@bduclaux Thanks for the comments. I have uploaded the code of algorithm 1.

neil-yc commented 5 years ago

@laming-chen thanks for your post, however i have got in the same situation with @bduclaux , numerical come early in the calculation, and less items than max_length return in one iteration, so i will execute more than one iteration but i may not make sense.
And the reason i think may be the difference between our similaritities, so is there any specific point for the construnction of similarity matrix and score matrix (values, scales ....) ? Thanks for your reply.

laming-chen commented 5 years ago

@Nicholasyc I think there are some things we can do:

If we want to return k items, then the rank of similarity matrix should be no less than k. In other words, the dimension of item vectors should be no less than k. This requirement applies to all MAP inference algorithm of DPP, since the probability of item set with size larger than rank is zero.
Score values should not matter too much. But if all of them are near zero, I think it is a good idea to scale them. Otherwise when we encounter numerical issue, we cannot really tell whether it is because of the low-rank of similarity matrix or the score values.
If k is a large number, I think it is a good time to try algorithm 2, where diversity is only required within a sliding window. In this case, k can be larger than the rank of similarity matrix.
If the rank of similarity matrix is less than k and you still wish to run algorithm 1, maybe you can change the kernel matrix to L + alpha * I, where I is the identity matrix and alpha is a small positive number. The new kernel is of full rank, but the risk is that we are optimizing a surrogate problem, thus solutions are less accurate.

laming-chen commented 5 years ago

@Nicholasyc I think this is a good topic to discuss. Would you please translate the title and your original post to English? I hope the discussions can be helpful to others.

neil-yc commented 5 years ago

@laming-chen thanks for your reply, so if u mean that the reason why i can not get k items as expected is the rank of similarity matrix is less than k ? If so, I am a little confused about the construction of similarity matrix. because sometimes we can not get similarity matrix always with feature vector (e.g. with pearson correlation or any other method), and then the rank of similarity may be not guaranteed ( but the dimension of similarity can be guaranteed to be N >> k)
i have translate the title and original post to english, thanks !

laming-chen commented 5 years ago

@Nicholasyc The rank of similarity matrix could be a reason. I think constructing the similarity matrix with feature vectors is probably the easiest way to guarantee that the similarity matrix is positive semidefinite (PSD). If the similarity matrix is not PSD, then it's no longer a DPP.

neil-yc commented 5 years ago

@laming-chen thanks for your reply. according to the paper, to be pricise, the kernel matrix which is constructed with quality and similarity should be PSD rather than the similarity matrix ? and if the rank of similarity matrix indeed the reason. should we need to pay for the checkout before we do the follow-up steps ? i wonder if there is a easy way to get a PSD matrix with similarity and quality calculated in black box.

laming-chen commented 5 years ago

@Nicholasyc It is easy to prove that the kernel matrix is PSD if and only if the similarity matrix is PSD.

To verify if a matrix is PSD or not, we can follow two steps:

The matrix should be symmetric.
All of its eigenvalues should be non-negative.

neil-yc commented 5 years ago

@laming-chen I've tried to verify eigenvalues with np.linalg.eig(matrix) to test whether all of eigenvalues of this matrix are non-negative, and if one eigenvalue is negative, i will replace it with 0, and finally construct a new PDS matrix (according to Practical Diversified Recommendations on YouTube with Determinantal Point Processes in section 4.2). However, it will cost time a little unacceptable.

laming-chen / fast-map-dpp

some questions about code in algorithm 1 #1