Closed yangs16 closed 8 years ago
@yangs16
The data is in binary format. The error seems caused by array index out of ranges. Can you check your data preprocessing. Usually this is caused by the invalid data format.
Thanks Fei. The problem is solved. There are some terms with 0 TF in the word_id.dict file.
@yangs16 Thanks! I will add this boundary check to avoid such unexpected crash.
I can run the nytimes example successfully. But on my own dataset, it failed with the following messages:
[INFO] [2015-11-20 16:26:13] INFO: block = 0, the number of slice = 1 [INFO] [2015-11-20 16:26:14] Server 0 starts: num_workers=1 endpoint=inproc://server [INFO] [2015-11-20 16:26:14] Server 0: Worker registratrion completed: workers=1 trainers=4 servers=1 [INFO] [2015-11-20 16:26:14] Rank 0/1: Multiverso initialized successfully. [INFO] [2015-11-20 16:26:14] Rank 0/1: Begin of configuration and initialization. foot.sh: line 13: 26600 Segmentation fault (core dumped) $bin/lightlda -num_vocabs 99948 -num_topics 50 -num_iterations 50 -alpha 0.1 -beta 0.01 -mh_steps 2 -num_local_workers 4 -num_blocks 1 -max_num_document 382578 -input_dir $dir -data_capacity 800
The program exited during processing the docs in the data blocks.
Any thought? Thanks a lot.