microsoft / LightLDA

Scalable, fast, and lightweight system for large-scale topic modeling
http://www.dmtk.io
MIT License
842 stars 234 forks source link

Program received segment fault in configuration and initialization #35

Open bitdjg opened 8 years ago

bitdjg commented 8 years ago

When I ran lightLDA on a single machine, I get the following message. [INFO] [2016-06-30 11:49:39] INFO: block = 0, the number of slice = 1 [INFO] [2016-06-30 11:49:39] Server 0 starts: num_workers=1 endpoint=inproc://server [INFO] [2016-06-30 11:49:39] Server 0: Worker registratrion completed: workers=1 trainers=1 servers=1 [INFO] [2016-06-30 11:49:39] Rank 0/1: Multiverso initialized successfully. [INFO] [2016-06-30 11:49:43] Rank 0/1: Begin of configuration and initialization. Segmentation fault (core dumped)

In other issues, say issue#15, I find that segmentation fault is usually caused by wrong tf count, which happened after "End of configuration and initialization". But my case happened before "End of configuration and initialization", so I wonder what can cause the fault of my case?

BTW, when I use gdb to debug, I get the following:

0 __memset_sse2 () at ../sysdeps/x86_64/memset.S:65

1 0x000000000041b023 in multiverso::RowFactory::CreateRow(int, multiverso::Format, int, void*) ()

2 0x000000000041f07d in multiverso::Table::GetRow(int) ()

3 0x000000000042c4ca in multiverso::Server::StartThread() ()

4 0x0000003e1feb6470 in std::(anonymous namespace)::execute_native_thread_routine (__p=) at ../../../../libstdc++-v3/src/thread.cc:44

5 0x00000035248079d1 in start_thread (arg=0x7f054cfc2700) at pthread_create.c:301

6 0x00000035244e88fd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115

Anyone can tell what has happened?

bitdjg commented 8 years ago

I find maybe this problem was caused by too large total TF count. If I reduce some words' TF count to less than 1 billion, then I will not get "segment fault" problem.