microsoft / LightLDA

Scalable, fast, and lightweight system for large-scale topic modeling
http://www.dmtk.io
MIT License
842 stars 234 forks source link

Invalid topic assignment from word proposal #10

Closed boche closed 8 years ago

boche commented 8 years ago

Hi, so I can run lightlda in single machine, but met problem when I tried to run them in distributed mode (mpi).

Here is my steps:

block.0  block.1  vocab.0  vocab.0.txt  vocab.1  vocab.1.txt

and distribute them into two machines.

mpiexec -f machine $bin/lightlda -num_vocabs 111400 -num_topics 1000 -num_servers 2 -num_iterations 10 -alpha 0.1 -beta 0.01 -mh_steps 2 -num_local_workers 4 -num_blocks 2 -max_num_document 300000 -input_dir $dir -data_capacity 800
[FATAL] [2015-11-18 15:51:28] Invalid topic assignment 280904469 from word proposal

and eventually failed with information:

===============================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 31489 RUNNING AT XXXXXXXXXXX
=   EXIT CODE: 139
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===============================================================================

Any idea why it happens?

feiga commented 8 years ago

Hi, @boche Have you solved the problem?

When you split your data into 2 parts using two machines, I assume you would like to train them parallel. Then each machine should contain only one block, with same name block.0.

The argument -num_blocks need to be 1.

I'm not sure whether I talked is your setting.

boche commented 8 years ago

Indeed, you are right, I noticed it after reading code.