thu-pacman / GeminiGraph

A computation-centric distributed graph processing system.
Apache License 2.0
316 stars 129 forks source link

Application gets stuck #8

Open kartiklakhotia opened 6 years ago

kartiklakhotia commented 6 years ago

I am trying to use Gemini for graph processing on a single server. It gives this warning:

" A high-performance Open MPI point-to-point messaging module was unable to find any relevant network interfaces"

and then the program gets stuck (most likely outcome). Sometimes the program proceeds forward after this warning and execution finishes but that happens like 1 in 10 times. As I wish to run multiple experiments, I intend to use scripts and with the application getting stuck more often than not, it is difficult. Please let me know how to resolve this.

coolerzxw commented 6 years ago

Hi. Can you run other simpler MPI programs (e.g. do some arithmetics and AllReduce the sum) correctly?

xuyinghai commented 6 years ago

My simple MPI hello world program with AllReduce works. But the following run doesn't work. Is it because MPI_THREAD_SERIALIZED?

$ mpirun -n 2 ./pagerank ../../inputs/soc-LiveJournal1.txt.bsnap 4847571 1
thread support level provided by MPI: MPI_THREAD_SERIALIZED
|V| = 4847571, |E| = 68475391
|V'_0| = 1261568 |E^dense_0| = 43518651
|V'_1| = 3586003 |E^dense_1| = 24956740
|V'_0_0| = 1261568 |E^dense_0_0| = 43518651
|V'_1_0| = 3586003 |E^dense_1_0| = 24956740
[yuxing-desk:01757] *** Process received signal ***
[yuxing-desk:01757] Signal: Segmentation fault (11)
[yuxing-desk:01757] Signal code: Address not mapped (1)
[yuxing-desk:01757] Failing at address: 0x14ec650
[yuxing-desk:01757] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7ff406b0d390]
[yuxing-desk:01757] [ 1] /usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_request_progress_frag+0x59)[0x7ff3f98dbc19]
[yuxing-desk:01757] [ 2] /usr/lib/openmpi/lib/openmpi/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x93)[0x7ff3fa344813]
[yuxing-desk:01757] [ 3] /usr/lib/openmpi/lib/openmpi/mca_btl_vader.so(+0x3abe)[0x7ff3fa344abe]
[yuxing-desk:01757] [ 4] /usr/lib/libopen-pal.so.13(opal_progress+0x4a)[0x7ff40602d1ea]
[yuxing-desk:01757] [ 5] /usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send+0x4c5)[0x7ff3f98d5745]
[yuxing-desk:01757] [ 6] /usr/lib/libmpi.so.12(PMPI_Send+0x14b)[0x7ff40753ccdb]
[yuxing-desk:01757] [ 7] ./pagerank[0x4151dd]
[yuxing-desk:01757] [ 8] ./pagerank[0x409db3]
[yuxing-desk:01757] [ 9] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7ff406752830]
[yuxing-desk:01757] [10] ./pagerank[0x409f79]
[yuxing-desk:01757] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 1757 on node yuxing-desk exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------