microsoft / LightLDA

Scalable, fast, and lightweight system for large-scale topic modeling
http://www.dmtk.io
MIT License
842 stars 234 forks source link

[Inference] Infer: Program received signal SIGSEGV, Segmentation fault. But train model is OK #25

Closed heavendai closed 7 years ago

heavendai commented 8 years ago

Hi, @feiga . I cannot inference new doc by lightlda infer tool. Can you give me a hand? My problem is:

  1. run the lightlda tool to train model using the following command: $bin/lightlda -num_vocabs 129505 -num_topics 10 -num_iterations 10 -alpha 0.5 -beta 0.01 -mh_steps 2 -num_local_workers 1 -num_blocks 1 -max_num_document 111000 -input_dir $dir -data_capacity 500 I got server_0_table_0.model, server_0_table_1.model and doc_topic.0
  2. run the infer tool to inference new docs using the following comand: mv doc_topic.0 doc_topic.0.tr $bin/infer -num_vocabs 129505 -num_topics 10 -num_iterations 10 -alpha 0.5 -beta 0.01 -mh_steps 2 -num_local_workers 1 -num_blocks 1 -max_num_document 110629 -input_dir $dir -data_capacity 500 I run the command in the same dir with running lightlda (including block.0, vocab.0, vocab.0.txt), But I got the error info: INFO] [2016-02-19 14:44:38] Actual Alias capacity: 5 MB [INFO] [2016-02-19 14:44:38] loading model [INFO] [2016-02-19 14:44:38] loading word topic table[server_0_table_0.model] [INFO] [2016-02-19 14:44:38] loading summary table[server_0_table_1.model] [INFO] [2016-02-19 14:44:38] block=0, Alias Time used: 0.11 s [INFO] [2016-02-19 14:44:38] iter=0 Segmentation fault (core dumped) $bin/infer -num_vocabs 129505 -num_topics 10 -num_iterations 10 -alpha 0.5 -beta 0.01 -mh_steps 2 -num_local_workers 1 -num_blocks 1 -max_num_document 110629 -input_dir $dir -data_capacity 500

GDB the program, the information like follow: (gdb) r -num_vocabs 129505 -num_topics 10 -num_iterations 10 -alpha 0.5 -beta 0.01 -mh_steps 2 -num_local_workers 1 -num_blocks 1 -max_num_document 110629 -input_dir /home/disk4/daimingyang/tools/DMTK/lightlda_feiga/example/data/20151001_65w_200k -data_capacity 500 Starting program: /home/disk4/daimingyang/tools/DMTK/lightlda_feiga/bin/infer -num_vocabs 129505 -num_topics 10 -num_iterations 10 -alpha 0.5 -beta 0.01 -mh_steps 2 -num_local_workers 1 -num_blocks 1 -max_num_document 110629 -input_dir /home/disk4/daimingyang/tools/DMTK/lightlda_feiga/example/data/20151001_65w_200k -data_capacity 500 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/tls/libthread_db.so.1". [INFO] [2016-02-19 14:45:47] Actual Alias capacity: 5 MB [INFO] [2016-02-19 14:45:47] loading model [INFO] [2016-02-19 14:45:47] loading word topic table[server_0_table_0.model] [INFO] [2016-02-19 14:45:47] loading summary table[server_0_table_1.model] [INFO] [2016-02-19 14:45:47] block=0, Alias Time used: 0.11 s [INFO] [2016-02-19 14:45:47] iter=0 [New Thread 0x40a00960 (LWP 10091)]

Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x40a00960 (LWP 10091)] 0x000000000041bdde in multiverso::Row::At(int) ()

Can you help me? Thank you.

hiyijian commented 8 years ago

it is odd enough. your program cored at somewhere request mutiverso. however, it is nothing about mutiverso since all data including model stored at local buffer when inference. can you please to show us the exact line number core dump happened? i will try to help

heavendai commented 8 years ago

hi, @hiyijian Thanks for your time. the debug core information as follows: Run till exit from #0 multiverso::lightlda::AliasTable::Build (this=0xe7da190, word=6, model=) at /home/tools/DMTK/lightlda_inf/src/alias_table.cpp:82 multiverso::lightlda::Inferer::BeforeIteration (this=this@entry=0xe7da530, block=block@entry=0) at /home/tools/DMTK/lightlda_inf/inference/inferer.cpp:53 Value returned is $2 = 0

looking forward to your reply.

hiyijian commented 8 years ago

I am sorry but can you share me with trained model and new/unseen docs data? I will try to reproduct it. I found that It's difficult to figure out the problem with core information.

heavendai commented 8 years ago

OK. How do i share the model with you? can you give me your email?

hiyijian commented 8 years ago

@heavendai hiyijian@qq.com

hiyijian commented 8 years ago

Except data validity, the problem partly caused by a boundary condition check. @feiga has already fixed in commit 733a06c. Thanks for your report