stingergraph / stinger

The STINGER in-memory graph store and dynamic graph analysis platform. Millions to billions of vertices and edges at thousands to millions of updates per second.
http://www.stingergraph.com
Other
208 stars 63 forks source link

Deadlock when creating large graph #252

Open ldhulipala opened 5 years ago

ldhulipala commented 5 years ago

Hi all,

I was trying to create a large graph (twitter, ~41M vertices and ~2B undirected edges), but ran into some trouble. The input file is stored in the DIMACS format. I'm using the dimacs loader which I specify as follows:

env STINGER_MAX_MEMSIZE=300G ./bin/stinger_server -i twitter_sym.gr -t d

After adding print statements, I noticed that the code is stalling in the parallel loop that calls insert_edge(..) over all edges read from the input file. A graph about 1/10th the size could be loaded in about 2 mins, so I would expect the load to take about 20-30 mins, but even after letting it run for several hours it still seems to be stuck in this loop. Twitter has a pretty skewed degree distribution, so many threads could be trying to insert into the same vertex's list simultaneously. Could it be that this is causing the insertion code to deadlock somehow?

I'm currently running the loop without the parallel-for to see if it will complete successfully when run sequentially. That would hint that there is some deadlock issue with insert_edge(..) when loading this graph. I will report back with the result tomorrow.

Thanks, Laxman

ehein6 commented 5 years ago

It's probably not deadlocking. It's just using an inefficient algorithm to construct the graph. stinger_insert_edge is O(N), where N is the degree of the inserted edge's source vertex. The edge list is scanned each time to check for duplicates, but we don't need this functionality for a bulk insert.

This bit of code should be changed to use stinger_batch_insert_edges or stinger_set_initial_edges: https://github.com/stingergraph/stinger/blob/4561008338627e08e296d9634a1f8d2a0facca34/lib/stinger_utils/src/dimacs_support.c#L211-L214

The server was recently upgraded to use the new functions (see https://github.com/stingergraph/stinger/commit/b42f752276f6f10a137622141d901c6d67038a3c), but all the graph loaders need to be upgraded too.

ldhulipala commented 5 years ago

Ahh, that makes sense. Thanks for the advice! I will update the loader code to use the bulk-update and report back.

(As you might have guessed running it sequentially was very slow and did not finish.)

ehein6 commented 5 years ago

If you do get it working, please contribute a pull request.

ldhulipala commented 5 years ago

Is there a simple way to change STINGER to compile with g++ (c++11)? I included

#include "stinger_core/stinger_batch_insert.h"

This file uses c++11 features, and the makefile seems to default to compile using C.

Also I'd be happy to add a pull-request once I figure this out.

Thanks again for all of the help, Laxman

ehein6 commented 5 years ago

@ldhulipala Change dimacs_support.c to .cc, also in the CMakeLists.txt.

ldhulipala commented 5 years ago

I gotstinger_set_initial_edges to work eventually. There are weird interactions with this function and stinger_consistency_check, however. I ended up having to comment out the consistency check (I think the underlying problem is that the stinger_set_initial_edges call only creates an out-edge for each edge, but I did not dig to verify this).

Running a BFS/BC after loading the graph works, however, and produces correct output, so I think the loading is working correctly. I will see if I can clean this up and send a pull request over the weekend if it's in a good state.

ehein6 commented 5 years ago

Getting the directions right is hard, that function was written before edges were stored in both directions. I have a working implementation here that starts with an edge list: https://github.com/DynoGraph/stinger-dynograph/blob/master/stinger_graph.cpp#L149

As long your algorithms never try to iterate over the in-edges, you can probably ignore the consistency check failure.