meereeum / lda2vec-tf

tensorflow port of the lda2vec model for unsupervised learning of document + topic + word embeddings
437 stars 100 forks source link

get the embedding of a new document #11

Open ali3assi opened 6 years ago

ali3assi commented 6 years ago

After we run the code and get the document embedding how can we use it to predit the embedding of a new unobserved document?

nateraw commented 6 years ago

I've been thinking on the same thing. Technically you cant (to my limited understanding). However, you could train another doc2vec and use something like gensim to infer the vector. Perhaps you could additionally make sure it is on the same scale as the lda2vec doc vectors. Either way, this is a tricky problem that I hope someone can figure out a good solution for!

I'll keep you updated if I figure something out that works in code.

ali3assi commented 6 years ago

@nateraw thank you. Buy the way you can check https://github.com/vijeth8/lda2vec-featurizer. This version give exception when the document test is of small size of vcab.

nateraw commented 6 years ago

@TamouzeAssi No problem! Thanks for the link, I'll check it out. I actually have my own version adapted from this one working in tf 1.5+. Still looking to add more to it though.

nateraw commented 6 years ago

@TamouzeAssi Did you see how the repository you linked to handled out of vocabulary documents with lda2vec? It seems that is a fair way to do it!! It is in the Readme.

ali3assi commented 6 years ago

@nateraw im trying to executing their code, but unfortunately im not able to let it execute yet. and no support from them

nateraw commented 6 years ago

@TamouzeAssi Let me upload my code for you tomorrow. If possible, I will try to replicate what they have going on before I upload it. I have my last finals for school today, so tomorrow I will be free.

I will offer support as much as possible, and will be continually updating that repository. :smile:

ali3assi commented 6 years ago

@nateraw good luck for your exams, i will appreciate your help so much.

ali3assi commented 6 years ago

@nateraw did you get the time to check please?

nateraw commented 6 years ago

@TamouzeAssi I didn't, unfortunately. I'm trying to make some user friendly changes as well as run some experiments. It needs documentation as well, so that people will understand how to interface with it.

I thought my experiment was going to take less time, but it proved to be a little tricky. I'll try to upload it within 48 hours. Sorry for the delay!!!

ali3assi commented 6 years ago

@nateraw thanks for your cooperation!

nateraw commented 6 years ago

@TamouzeAssi I uploaded my code, check it out. I wasn't able to implement the functionality we talked about yet. :frowning_face: I'll try to get after it. Post any issues you have or functionality you want and I'll try to add it. Right now, I know there are a couple issues with reloading the model...they have to be fixed by messing with the logdir variables in the lda2vec.py file...I'll fix it ASAP.

Hope this helps, and doesnt just add additional confusion!

ali3assi commented 6 years ago

@nateraw now i have confusion. On which code you are talking? yours or the https://github.com/vijeth8/lda2vec-featurizer.? can you send me your email

nateraw commented 6 years ago

@TamouzeAssi nxr9266@rit.edu, email me any time.

I mean my code! Also, I fixed the restore feature.

ali3assi commented 6 years ago

Im a little bit confused. Your code cannot generate the topic modeling for a test document. So please correct me if im wrong. You are trying to add this feature?

Which restore feature you mean?

11 May 2018 at 15:48, Nathan Raw notifications@github.com wrote:

@TamouzeAssi https://github.com/TamouzeAssi nxr9266@rit.edu, email me any time.

I mean my code! Also, I fixed the restore feature.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/meereeum/lda2vec-tf/issues/11#issuecomment-388467912, or mute the thread https://github.com/notifications/unsubscribe-auth/APdrK0-g2Y-7ZWPLvB5eVbcqPiqcfzB1ks5txesNgaJpZM4TxtlF .

nateraw commented 6 years ago

I was talking about model saving/restoring (the weights/etc), it was broken before but now it works.

The topic modeling for Out of Corpus documents does not work yet, we need to add it according to the way the other repository does it. I did not get time to implement it, unfortunately.

ali3assi commented 6 years ago

There is only one other repository that talk about this feature.

nateraw commented 6 years ago

Yes! I will try to implement this feature in my version ASAP. Not exactly sure how they are doing it in this repository: lda2vec-featurizer , but I will try to figure it out and add it. You might be able to figure it out on your own by using the get_k_closest function in my version, but it would probably be extremely confusing.

If you have any issues, post them on my repository!

MovGP0 commented 6 years ago

duplicate of #1