sudar / Yahoo_LDA

Yahoo!'s topic modelling framework using Latent Dirichlet Allocation
Apache License 2.0
338 stars 125 forks source link

the results of Y!LDA with multi machines #10

Open yanbo68 opened 12 years ago

yanbo68 commented 12 years ago

Hi,

 I am using Y!LDA in Hadoop with 3 computers.
 I got the results of "train mode" and found it a little bit confusion.  I ran the script with --topics=20, and found that the files "lda.docToTop.txt, lda.topToWor.txt, lda.worToTop.txt" exist in 3 different directories. Each directory has 20 topics. Is it correct? 
What am I supposed to get the "test" result from the "trained model"? Still 3 different directories?

Hope somebody can help me. Thanks a lot!

Yanbo

shravanmn commented 12 years ago

Yes that is correct.

Lda.docToTop & lda.worToTop are local to each machine. Essentially topic assignments for documents in the chunk assigned to a machine.

Lda.topToWor is expected to be similar across the 3 machines. For an interpretation of the topic model you can use any one of them.

But there is only one global model built which is stored in the global folder along with the global dictionary. This is the one used while testing.

--Shravan

-----Original Message----- From: yanbo68 [mailto:reply@reply.github.com] Sent: Thursday, April 12, 2012 1:06 PM To: Shravan Narayanamurthy Subject: [Yahoo_LDA] the results of Y!LDA with multi machines (#10)

Hi,

 I am using Y!LDA in Hadoop with 3 computers.
 I got the results of "train mode" and found it a little bit confusion.  I ran the script with --topics=20, and found that the files "lda.docToTop.txt, lda.topToWor.txt, lda.worToTop.txt" exist in 3 different directories. Each directory has 20 topics. Is it correct? 
What am I supposed to get the "test" result from the "trained model"? Still 3 different directories?

Hope somebody can help me. Thanks a lot!

Yanbo


Reply to this email directly or view it on GitHub: https://github.com/shravanmn/Yahoo_LDA/issues/10

yanbo68 commented 12 years ago

Thanks a lot!

I checked the lda.topToword file. For the result of "train mode", each topic has almost 4 different words for different machine. But "test mode" is much better, only 1 different word for each topic. I think I can interpret the topic model using "test mode" result.

Btw, for the topic counts table, though there are 3 tables after "train mode", I found that it seems the system will merge the 3 tables together during the "test mode"? The LOG says :"Initializing Word-Topic counts table from 3 dumps with topic_counts/lda.ttc.dump as prefix ......" So each machine is using the same big table?

shravanmn commented 12 years ago

In line...

-----Original Message----- From: yanbo68 [mailto:reply@reply.github.com] Sent: Friday, April 13, 2012 8:29 AM To: Shravan Narayanamurthy Subject: Re: [Yahoo_LDA] the results of Y!LDA with multi machines (#10)

Thanks a lot!

I checked the lda.topToword file. For the result of "train mode", each topic has almost 4 different words for different machine.

[shrav] How many iterations did you run?

But "test mode" is much better, only 1 different word for each topic. I think I can interpret the topic model using "test mode" result.

Btw, for the topic counts table, though there are 3 tables after "train mode", I found that it seems the system will merge the 3 tables together during the "test mode"? The LOG says :"Initializing Word-Topic counts table from 3 dumps with topic_counts/lda.ttc.dump as prefix ......" So each machine is using the same big table?

[shrav] Yes. A global table is created and a local table per machine is induced using the global table.

--Shravan


Reply to this email directly or view it on GitHub: https://github.com/shravanmn/Yahoo_LDA/issues/10#issuecomment-5107635

yanbo68 commented 12 years ago

I ran 200 iterations

shravanmn commented 12 years ago

If you run about 500 to 600 iterations, the words will look similar in the different topToWor files. This is what we have observed. --Shravan

-----Original Message----- From: yanbo68 [mailto:reply@reply.github.com] Sent: Friday, April 13, 2012 4:07 PM To: Shravan Narayanamurthy Subject: Re: [Yahoo_LDA] the results of Y!LDA with multi machines (#10)

I ran 200 iterations


Reply to this email directly or view it on GitHub: https://github.com/shravanmn/Yahoo_LDA/issues/10#issuecomment-5112216

yanbo68 commented 12 years ago

Thanks a lot! I will try more iterations.