running other model - Githubissues

hugochan / KATE

Code & data accompanying the KDD 2017 paper "KATE: K-Competitive Autoencoder for Text"

BSD 3-Clause "New" or "Revised" License

142 stars 49 forks source link

running other model #12

Closed un-lock-me closed 6 years ago

un-lock-me commented 6 years ago

Hi again:)

I would like to see the difference between the KATE model output with other model.

do you mind letting me know how to run other model mostly: DocNADE, VAE, Word2vec? For DocNADE I do not know what is "train_doc_codes", how to create this? Actually I think if I firstly run lib2svm it will make this format of corpus data but I got an error during running I was wondering should I take any other step?

Word2vec: there is two script one is run_w2vec and another is run_doc_w2vec. which one is the script should I run also what is this parameters: train flag,docname, path to the trained model (if its in training phase), path to the word2vec mod file.

hugochan commented 6 years ago

Hello @saria85 ,

For running DocNADE, please check out www.dmi.usherb.ca/~larocheh/code/DocNADE.zip. Run run_w2v.py to train the word2vec model and get document representations. As for parameters in run_w2v.py, you dot not need docnames.

un-lock-me commented 6 years ago

Hi,

It taking lots of time for running, did you experience the same thing? Also, it only created two file new_test.libsvm and new_train.libsvm. which is like a vector then where can I find the clusters of the topics. Do you have any idea? Thanks, :)

un-lock-me commented 6 years ago

Also Im going to have wordclouds visualize my topics, you said that I need to change the list of words in pred.py according to my output My question is that can I choose any word I like or it should be according to their strength, like choosing the first five highest weight in each topic.

Thanks, ;)

hugochan commented 6 years ago

@saria85 DocNADE takes quite a time to train. Based on the paper, a topic i is visualized by picking the 10 words w with strongest connection W_iw. You might need to dig deeper into the code to fetch the weight matrix.

hugochan commented 6 years ago

You can visualize any word in your vocabulary.