microsoft / LightLDA

Scalable, fast, and lightweight system for large-scale topic modeling
http://www.dmtk.io
MIT License
842 stars 235 forks source link

How can i use the result to train a topic model #61

Closed ruskie95 closed 6 years ago

ruskie95 commented 6 years ago

I am a studen. I have just learned about machine learning. How can i use the doc_topic , server_0_table_0 (word_topic table) and server_0_table_1 to train and test a model. The model is trainned for Text Classifcation which i can use it to auto-recognize the topic of newspapers Thank you!

1234clam commented 6 years ago

there is a infer executable file in the bin directory and you can run the file by fellow command:

$bin/infer -num_vocabs 111400 -num_topics 1000 -num_iterations 100 -alpha 0.1 -beta 0.01 -mh_steps 2 -num_local_workers 1 -num_blocks 1 -max_num_document 300000 -input_dir $dir -data_capacity 800

it will get the result in doc_topic.0 file contain the inference result. Before run the infer you should put the server_0_table_1.model server_0_table_0.model block.0 vocab.0 vocab.0.txt in the input directory. I hope this answer can help you to solve this question.

ruskie95 commented 6 years ago

Do you have any ideal i can use those file for WEKA to build a model.It seems incompatible with each other.

1234clam commented 6 years ago

@ruskie95

https://github.com/Microsoft/LightLDA/issues/60