thu-pacman / GeminiGraph

A computation-centric distributed graph processing system.
Apache License 2.0
312 stars 128 forks source link

Segmentation Fault when running graph algorithms #15

Open vaastav opened 5 years ago

vaastav commented 5 years ago

Hi I am trying to run the graph algorithms in the toolkits on cit-Patents but I keep getting the following error :

[hennessy:05177] Process received signal [hennessy:05177] Signal: Segmentation fault (11) [hennessy:05177] Signal code: Invalid permissions (2) [hennessy:05177] Failing at address: 0x7f1ad8050d04 [hennessy:05177] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f1af0ee9390] [hennessy:05177] [ 1] ./cc[0x415077] [hennessy:05177] [ 2] ./cc[0x40a332] [hennessy:05177] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f1af0b2e830] [hennessy:05177] [ 4] ./cc[0x4099b9] [hennessy:05177] End of error message Segmentation fault (core dumped)

The command I am running is ./pagerank /path/to/cit-patents.binedgelist 3774768

Any help would be appreciated!

coolerzxw commented 5 years ago

Hi. Can you give more detailed information (e.g. where the error took place)?

vaastav commented 5 years ago

According to gdb, the segfaulting line is core/graph.hpp:819 on __sync_fetch_and_add(&out_degree[src],1) but the address that its segfaulting on doesn't seem to map to any variable.

coolerzxw commented 5 years ago

This may happen when the max vertex id >= |V| (3774768 in your case). You can solve this by either giving the max vertex id + 1 as |V| or remapping the input graph so that vertex ids fit into [0, |V|).

vaastav commented 5 years ago

I tried using the max vertex id + 1 and I don't get a segfault anymore. But, when I tried it with pagerank for 20 iterations, the execution did not finish at all and I let it run for nearly 40 minutes. Also, the execution did not print out any output whatsoever. I am not particularly sure what is going on. The load_directed method never seems to finish. I am not sure if the preprocessing time should be taking this long.

coolerzxw commented 5 years ago

That is strange. Can you locate where the program stucks?

vaastav commented 5 years ago

Yes, it is getting stuck in the while loop at https://github.com/thu-pacman/GeminiGraph/blob/master/core/graph.hpp#L1083

Sometimes it stops after 1 iteration, sometimes after 2 iterations.

vaastav commented 5 years ago

This seems to be coming from using the openmpi library instead of the mpich library. Everything works with the mpich library as far as I can tell

coolerzxw commented 5 years ago

Well, I cannot tell why it got stuck here... What are the versions of OpenMPI and compiler you used?