thu-pacman / GeminiGraph

A computation-centric distributed graph processing system.
Apache License 2.0
316 stars 129 forks source link

What files does it need? #6

Closed bibrakc closed 6 years ago

bibrakc commented 6 years ago

I have downloaded the graph files from here

I used cnr-2000 file which has 325557 nodes (vertices) and I get the following output (error) for pagerank:

./pagerank ./cnr-2000.graph 325557 10
thread support level provided by MPI: MPI_THREAD_SERIALIZED
edge_data_size = 0, unit_size = 4, edge_unit_size = 8
threads=1*4
interleave on 0
|V| = 325557, |E| = 145605
Before SEEK <-- I put these in the core/graph.hpp
AFTER SEEK <-- I put these in the core/graph.hpp
BEFORE sync src = 1112356484 <-- I put these in the core/graph.hpp
BEFORE sync src = 188557346 <-- I put these in the core/graph.hpp
[my_pc:16578] *** Process received signal ***
[my_pc:16578] Signal: Segmentation fault (11)
[my_pc:16578] Signal code: Address not mapped (1)
[my_pc:16578] Failing at address: 0x7f18466f9a10
Segmentation fault (core dumped)

Seems like there are at least two problems:

1) It estimates that the number of Edges is 145605, whereas on the database website it says that arcs (edges) are 3216152

2) src = 1112356484 <-- is certainly out of bounds and therefore there is a segmentation fault.

Any idea what is going on and how to fix it?

Or perhaps GeminiGraph needs some other format in which the graph is stored. Could you please specify as to what that format is?

coolerzxw commented 6 years ago

Hi. Since your downloaded file is in WebGraph format, you need to write a program using the WebGraph library to decode the downloaded data and write out the edges in binary (i.e. each edge consists of two 32-bit integers representing source and destination, and an optional weight if you need to run algorithms on a weighted graph). Note that the byte orders of Java and C++ programs on your machine may be different (big-endians vs. little-endians) so you might need to pay attention to that.

bibrakc commented 6 years ago

Thanks for that! Ok so what I did is that I wrote a simple C program that reads from the graph an ASCII file and writes it into a binary file, as required by Gemini.

Here is that code: here The file src/ASCII2Bin.c contains the converter The file data/scale16_s.mm is the input to it, it has 65535 vertices and 1818848 edges

When I run: mpirun -n 1 ./sssp ./scale16_s.bin 65535 39

It seems to estimate the number of edges correctly. Further it gives the following output and halts with this error message:

thread support level provided by MPI: MPI_THREAD_SERIALIZED thread-0 bound to socket-0 thread-3 bound to socket-0 thread-2 bound to socket-0 thread-1 bound to socket-0 threads=1*4 interleave on 0 |V| = 65535, |E| = 1818848 |V'_0| = 65535 |E^dense_0| = 1818761 |V'_0_0| = 65535 |E^dense_0_0| = 1818761 sssp: /home/user/codes/GeminiGraph-master/core/graph.hpp:336: int Graph::get_partition_id(VertexId) [with EdgeData = float; VertexId = unsigned int]: Assertion `false' failed. [mypc:22558] Process received signal [mypc:22558] Signal: Aborted (6) [mypc:22558] Signal code: (-6) [mypc:22558] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7f3e91e51330] [mypc:22558] [ 1] /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x37)[0x7f3e91ab2c37] [mypc:22558] [ 2] /lib/x86_64-linux-gnu/libc.so.6(abort+0x148)[0x7f3e91ab6028] [mypc:22558] [ 3] /lib/x86_64-linux-gnu/libc.so.6(+0x2fbf6)[0x7f3e91aabbf6] [mypc:22558] [ 4] /lib/x86_64-linux-gnu/libc.so.6(+0x2fca2)[0x7f3e91aabca2] [mypc:22558] [ 5] ./sssp[0x4125bc] [mypc:22558] [ 6] ./sssp[0x409b5d] [mypc:22558] [ 7] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f3e91a9df45] [mypc:22558] [ 8] ./sssp[0x409cef] [mypc:22558] End of error message

mpirun noticed that process rank 0 with PID 22558 on node tulip exited on signal 6 (Aborted).

coolerzxw commented 6 years ago

Hi, we assume the vertices are numbered from 0. So you need to replace 65535 with 65536 as the number of vertices and then it should work.

bibrakc commented 6 years ago

That helps!

I changed the code to do a decrement of vertex ids when I read them from the file, so it offsets it. The file that I am using starts from index 1 not 0.