Open yanbo68 opened 12 years ago
Yes that is correct.
Lda.docToTop & lda.worToTop are local to each machine. Essentially topic assignments for documents in the chunk assigned to a machine.
Lda.topToWor is expected to be similar across the 3 machines. For an interpretation of the topic model you can use any one of them.
But there is only one global model built which is stored in the global folder along with the global dictionary. This is the one used while testing.
--Shravan
-----Original Message----- From: yanbo68 [mailto:reply@reply.github.com] Sent: Thursday, April 12, 2012 1:06 PM To: Shravan Narayanamurthy Subject: [Yahoo_LDA] the results of Y!LDA with multi machines (#10)
Hi,
I am using Y!LDA in Hadoop with 3 computers.
I got the results of "train mode" and found it a little bit confusion. I ran the script with --topics=20, and found that the files "lda.docToTop.txt, lda.topToWor.txt, lda.worToTop.txt" exist in 3 different directories. Each directory has 20 topics. Is it correct?
What am I supposed to get the "test" result from the "trained model"? Still 3 different directories?
Hope somebody can help me. Thanks a lot!
Yanbo
Reply to this email directly or view it on GitHub: https://github.com/shravanmn/Yahoo_LDA/issues/10
Thanks a lot!
I checked the lda.topToword file. For the result of "train mode", each topic has almost 4 different words for different machine. But "test mode" is much better, only 1 different word for each topic. I think I can interpret the topic model using "test mode" result.
Btw, for the topic counts table, though there are 3 tables after "train mode", I found that it seems the system will merge the 3 tables together during the "test mode"? The LOG says :"Initializing Word-Topic counts table from 3 dumps with topic_counts/lda.ttc.dump as prefix ......" So each machine is using the same big table?
In line...
-----Original Message----- From: yanbo68 [mailto:reply@reply.github.com] Sent: Friday, April 13, 2012 8:29 AM To: Shravan Narayanamurthy Subject: Re: [Yahoo_LDA] the results of Y!LDA with multi machines (#10)
Thanks a lot!
I checked the lda.topToword file. For the result of "train mode", each topic has almost 4 different words for different machine.
[shrav] How many iterations did you run?
But "test mode" is much better, only 1 different word for each topic. I think I can interpret the topic model using "test mode" result.
Btw, for the topic counts table, though there are 3 tables after "train mode", I found that it seems the system will merge the 3 tables together during the "test mode"? The LOG says :"Initializing Word-Topic counts table from 3 dumps with topic_counts/lda.ttc.dump as prefix ......" So each machine is using the same big table?
[shrav] Yes. A global table is created and a local table per machine is induced using the global table.
--Shravan
Reply to this email directly or view it on GitHub: https://github.com/shravanmn/Yahoo_LDA/issues/10#issuecomment-5107635
I ran 200 iterations
If you run about 500 to 600 iterations, the words will look similar in the different topToWor files. This is what we have observed. --Shravan
-----Original Message----- From: yanbo68 [mailto:reply@reply.github.com] Sent: Friday, April 13, 2012 4:07 PM To: Shravan Narayanamurthy Subject: Re: [Yahoo_LDA] the results of Y!LDA with multi machines (#10)
I ran 200 iterations
Reply to this email directly or view it on GitHub: https://github.com/shravanmn/Yahoo_LDA/issues/10#issuecomment-5112216
Thanks a lot! I will try more iterations.
Hi,
Yanbo