ziyin-dl / word-embedding-dimensionality-selection

On the Dimensionality of Word Embedding
https://nips.cc/Conferences/2018/Schedule?showEvent=12567
MIT License
329 stars 44 forks source link

Spectral Estimation #16

Closed asadullah797 closed 5 years ago

asadullah797 commented 5 years ago

Great Work!! I am following this paper and its code and I have a question related to Spectral Estimation. How did you get this formula. The reference you have given from [Chatterjee, 2015], I even could not find it in paper. Also I want to use NMF(Non-negative matrix factorization) instead of SVD, when I use the spectral estimation formula in the paper, I got all the elements become zero. How could I overcome this issue. Thanks!

ziyin-dl commented 5 years ago
  1. This is equivalent to the "symmetric model" in [Chatterjee, 2015]:

    • X (in [Chatterjee, 2015]) = M+N where N is an IID noise matrix. Note E[Xij]=Mij.
    • p=1, meaning every entry of X is observed.
    • η -> 0 Under the three points we recover the 2*\sqrt{n}\sigma threshold for singular values. Note the following remark in [Chatterjee, 2015]: "One limitation of USVT is the requirement that the entries should lie in a bounded interval. One may relax this requirement by assuming, for example, that the errors xij −mij are distributed as normal random variables with mean zero and variance σ^2. If σ^2 is known, then I believe that one can modify the USVT algorithm by thresholding at (2 + η)σ√n and obtain the same theorems."
  2. I believe you might have used the wrong order for NMF in this case. I think the following procedures are reasonable:

    • Do the spectral estimation first, on the original matrix (say, calculate SVD(\tilde M)=UDV^T).
    • Apply the soft thresholding operator t on D, and get the "noise filtered" matrix \hat M= Ut(D)V^T
    • Do a NMF on \hat M. In this case the spectral estimation part should be identical.
asadullah797 commented 5 years ago

Really appreciate your response. Could you please share me what could be the future work using this paper as baseline. I want to extend your paper work but I don't know the direction. Can you please give me some suggestions/feedback. Thank you

ziyin-dl commented 5 years ago

I think one direction is to look at other representation learning schemes and see if similar principals apply there. For example the intermediate representation obtained by VAE have a fixed, lower dimensionality, and there people always choose a random one eg 300. It would be of great value if we can say something about it.

On Wed, May 15, 2019, 16:14 ASADULLAH notifications@github.com wrote:

Really appreciate your response. Could you please share me what could be the future work using this paper as baseline. I want to extend your paper work but I don't know the direction. Can you please give me some suggestions/feedback. Thank you

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ziyin-dl/word-embedding-dimensionality-selection/issues/16?email_source=notifications&email_token=AB7IJUFYCRV7MKXARZLUXLLPVRVMDA5CNFSM4HM6VKQKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVP2FBY#issuecomment-492806791, or mute the thread https://github.com/notifications/unsubscribe-auth/AB7IJUD4BY6JXB7MXVUNAR3PVRVMDANCNFSM4HM6VKQA .