microsoft / LightLDA

Scalable, fast, and lightweight system for large-scale topic modeling
http://www.dmtk.io
MIT License
842 stars 235 forks source link

corpus_size_ > memory_block_size when reading file /data/block.0 #78

Closed SilentCC closed 5 years ago

SilentCC commented 5 years ago

Any answer?

root@ml-server:/data# mpiexec -f machinefile lightlda -num_vocabs 1300000 -num_topics 10000 -num_iterations 100 -alpha 0.005 -beta 0.01 -mh_steps 2 -num_local_workers 11 -num_blocks 1 -max_num_document 1400000 -input_dir /data -data_capactiy 7000
[INFO] [2018-11-06 15:32:48] Actual Model capacity: 274 MB, Alias capacity: 507 MB, Delta capacity: 256MB
[INFO] [2018-11-06 15:32:48] Actual Model capacity: 292 MB, Alias capacity: 507 MB, Delta capacity: 256MB
[INFO] [2018-11-06 15:32:48] Actual Model capacity: 319 MB, Alias capacity: 511 MB, Delta capacity: 256MB
[INFO] [2018-11-06 15:32:48] Actual Model capacity: 330 MB, Alias capacity: 512 MB, Delta capacity: 250MB
[INFO] [2018-11-06 15:32:48] Actual Model capacity: 337 MB, Alias capacity: 512 MB, Delta capacity: 242MB
[INFO] [2018-11-06 15:32:48] Actual Model capacity: 344 MB, Alias capacity: 512 MB, Delta capacity: 239MB
[INFO] [2018-11-06 15:32:48] Actual Model capacity: 350 MB, Alias capacity: 512 MB, Delta capacity: 228MB
[INFO] [2018-11-06 15:32:48] Actual Model capacity: 345 MB, Alias capacity: 512 MB, Delta capacity: 229MB
[INFO] [2018-11-06 15:32:48] INFO: block = 0, the number of slice = 9
[INFO] [2018-11-06 15:32:48] Actual Model capacity: 275 MB, Alias capacity: 508 MB, Delta capacity: 256MB
[INFO] [2018-11-06 15:32:48] Actual Model capacity: 293 MB, Alias capacity: 508 MB, Delta capacity: 256MB
[INFO] [2018-11-06 15:32:48] Actual Model capacity: 320 MB, Alias capacity: 512 MB, Delta capacity: 255MB
[INFO] [2018-11-06 15:32:48] Actual Model capacity: 330 MB, Alias capacity: 512 MB, Delta capacity: 248MB
[INFO] [2018-11-06 15:32:48] Actual Model capacity: 337 MB, Alias capacity: 512 MB, Delta capacity: 242MB
[INFO] [2018-11-06 15:32:48] Actual Model capacity: 344 MB, Alias capacity: 512 MB, Delta capacity: 239MB
[INFO] [2018-11-06 15:32:48] Actual Model capacity: 350 MB, Alias capacity: 512 MB, Delta capacity: 227MB
[INFO] [2018-11-06 15:32:48] Actual Model capacity: 345 MB, Alias capacity: 512 MB, Delta capacity: 229MB
[INFO] [2018-11-06 15:32:48] INFO: block = 0, the number of slice = 9
[INFO] [2018-11-06 15:32:48] Actual Model capacity: 274 MB, Alias capacity: 507 MB, Delta capacity: 256MB
[INFO] [2018-11-06 15:32:48] Actual Model capacity: 293 MB, Alias capacity: 507 MB, Delta capacity: 256MB
[INFO] [2018-11-06 15:32:48] Actual Model capacity: 320 MB, Alias capacity: 511 MB, Delta capacity: 256MB
[INFO] [2018-11-06 15:32:48] Actual Model capacity: 330 MB, Alias capacity: 512 MB, Delta capacity: 250MB
[INFO] [2018-11-06 15:32:48] Actual Model capacity: 337 MB, Alias capacity: 512 MB, Delta capacity: 242MB
[INFO] [2018-11-06 15:32:48] Actual Model capacity: 344 MB, Alias capacity: 512 MB, Delta capacity: 238MB
[INFO] [2018-11-06 15:32:48] Actual Model capacity: 350 MB, Alias capacity: 512 MB, Delta capacity: 227MB
[INFO] [2018-11-06 15:32:48] Actual Model capacity: 345 MB, Alias capacity: 512 MB, Delta capacity: 229MB
[INFO] [2018-11-06 15:32:48] INFO: block = 0, the number of slice = 9
[INFO] [2018-11-06 15:32:48] Server 0 starts: num_workers=3 endpoint=inproc://server
[INFO] [2018-11-06 15:32:48] Server 2 starts: num_workers=3 endpoint=inproc://server
[INFO] [2018-11-06 15:32:48] Server 1 starts: num_workers=3 endpoint=inproc://server
[INFO] [2018-11-06 15:32:48] Server 0: Worker registratrion completed: workers=3 trainers=33 servers=3
[INFO] [2018-11-06 15:32:48] Rank 1/3: Multiverso initialized successfully.
[INFO] [2018-11-06 15:32:48] Rank 0/3: Multiverso initialized successfully.
[INFO] [2018-11-06 15:32:48] Rank 2/3: Multiverso initialized successfully.
[FATAL] [2018-11-06 15:32:48] Rank 1: corpus_size_ > memory_block_size when reading file /data/block.0
[FATAL] [2018-11-06 15:32:48] Rank 2: corpus_size_ > memory_block_size when reading file /data/block.0
[FATAL] [2018-11-06 15:32:48] Rank 0: corpus_size_ > memory_block_size when reading file /data/block.0
[INFO] [2018-11-06 15:32:49] Rank 2/3: Begin of configuration and initialization.
[INFO] [2018-11-06 15:32:49] Rank 0/3: Begin of configuration and initialization.
[INFO] [2018-11-06 15:32:49] Rank 1/3: Begin of configuration and initialization.

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   EXIT CODE: 11
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
[proxy:0:0@ml-server] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:0@ml-server] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0@ml-server] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
[proxy:0:2@ml-server002] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:886): assert (!closed) failed
[proxy:0:2@ml-server002] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:2@ml-server002] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
[mpiexec@ml-server] HYDT_bscu_wait_for_completion (./tools/bootstrap/utils/bscu_wait.c:76): one of the processes terminated badly; aborting
[mpiexec@ml-server] HYDT_bsci_wait_for_completion (./tools/bootstrap/src/bsci_wait.c:23): launcher returned error waiting for completion
[mpiexec@ml-server] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:217): launcher returned error waiting for completion
[mpiexec@ml-server] main (./ui/mpich/mpiexec.c:331): process manager error waiting for completion
SilentCC commented 5 years ago

This is because the value of data_capacity too small .

See the code in data_block.cpp

 memory_block_size_ = Config::data_capacity / sizeof(int32_t);