Open kartiklakhotia opened 6 years ago
Hi. Can you run other simpler MPI programs (e.g. do some arithmetics and AllReduce the sum) correctly?
My simple MPI hello world program with AllReduce works. But the following run doesn't work. Is it because MPI_THREAD_SERIALIZED?
$ mpirun -n 2 ./pagerank ../../inputs/soc-LiveJournal1.txt.bsnap 4847571 1
thread support level provided by MPI: MPI_THREAD_SERIALIZED
|V| = 4847571, |E| = 68475391
|V'_0| = 1261568 |E^dense_0| = 43518651
|V'_1| = 3586003 |E^dense_1| = 24956740
|V'_0_0| = 1261568 |E^dense_0_0| = 43518651
|V'_1_0| = 3586003 |E^dense_1_0| = 24956740
[yuxing-desk:01757] *** Process received signal ***
[yuxing-desk:01757] Signal: Segmentation fault (11)
[yuxing-desk:01757] Signal code: Address not mapped (1)
[yuxing-desk:01757] Failing at address: 0x14ec650
[yuxing-desk:01757] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7ff406b0d390]
[yuxing-desk:01757] [ 1] /usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_request_progress_frag+0x59)[0x7ff3f98dbc19]
[yuxing-desk:01757] [ 2] /usr/lib/openmpi/lib/openmpi/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x93)[0x7ff3fa344813]
[yuxing-desk:01757] [ 3] /usr/lib/openmpi/lib/openmpi/mca_btl_vader.so(+0x3abe)[0x7ff3fa344abe]
[yuxing-desk:01757] [ 4] /usr/lib/libopen-pal.so.13(opal_progress+0x4a)[0x7ff40602d1ea]
[yuxing-desk:01757] [ 5] /usr/lib/openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send+0x4c5)[0x7ff3f98d5745]
[yuxing-desk:01757] [ 6] /usr/lib/libmpi.so.12(PMPI_Send+0x14b)[0x7ff40753ccdb]
[yuxing-desk:01757] [ 7] ./pagerank[0x4151dd]
[yuxing-desk:01757] [ 8] ./pagerank[0x409db3]
[yuxing-desk:01757] [ 9] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7ff406752830]
[yuxing-desk:01757] [10] ./pagerank[0x409f79]
[yuxing-desk:01757] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 1 with PID 1757 on node yuxing-desk exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
I am trying to use Gemini for graph processing on a single server. It gives this warning:
" A high-performance Open MPI point-to-point messaging module was unable to find any relevant network interfaces"
and then the program gets stuck (most likely outcome). Sometimes the program proceeds forward after this warning and execution finishes but that happens like 1 in 10 times. As I wish to run multiple experiments, I intend to use scripts and with the application getting stuck more often than not, it is difficult. Please let me know how to resolve this.