guilgautier / DPPy

Python toolbox for sampling Determinantal Point Processes
https://dppy.readthedocs.io
MIT License
219 stars 52 forks source link

probabilities do not sum to 1 in exact_sampling.py #61

Closed boyu-zhang-25 closed 3 years ago

boyu-zhang-25 commented 4 years ago

Hi,

I am trying to perform exact sampling in k-DPP with Gram-Schmidt orthogonalization and likelihood kernels. However, I encountered the following error:

  File "/anaconda3/lib/python3.7/site-packages/dppy/finite_dpps.py", line 621, in sample_exact_k_dpp
    random_state=rng)
  File "/anaconda3/lib/python3.7/site-packages/dppy/finite_dpps.py", line 586, in sample_exact_k_dpp
    random_state=rng)
  File "/anaconda3/lib/python3.7/site-packages/dppy/exact_sampling.py", line 466, in proj_dpp_sampler_eig
    sampl = proj_dpp_sampler_eig_GS(eig_vecs, size, rng)
  File "/anaconda3/lib/python3.7/site-packages/dppy/exact_sampling.py", line 536, in proj_dpp_sampler_eig_GS
    p=np.abs(norms_2[avail]) / (rank - it))
  File "mtrand.pyx", line 1148, in mtrand.RandomState.choice
ValueError: probabilities do not sum to 1

It is in the proj_dpp_sampler_eig_GS method at line 488 of exact_sampling.py.

I tried to print out the norms_2[avail] after each iteration of for it in range(size), and it seems like it is extremely small after several iterations. The rank = 392 and N = 784. The error happens during the second or third iteration, after performing norms_2[avail] -= c[avail, it]**2 # update residual norm^2

I am using dppy==0.3.0 installed from pip on Ubuntu. Any idea why this is happening? I suspect that the square norm of the selected eigenvectors is too small. So, could this be a problem with my kernel or any process such as the elementary symmetric polynomials or the eigenvector selector? Any guess would be appreciated.

Thank you!

guilgautier commented 4 years ago

Hi @cristianoBY

Could you please add a complete code example to reproduce your setup?

Your problem must be related to a rounding issue, the normalizing constant of the conditional must be k-it and np.sum(norms_2[avail]) should be numerically close to that integer. Can you check that?

It is also important to check that rank(L) <= k and to plot the histogram of your eigenvalues to see the eigenvalue decay. As you mentioned, this may impact the stability of the evaluation of the elementary symmetric polynomials.

By the way, I am curious to know the context in which you wish to select half of your ground set (k=N/2) with a k-DPP.

guilgautier commented 4 years ago

Hi @cristianoBY, Any news about your issue ?