Open ldhulipala opened 6 years ago
It's probably not deadlocking. It's just using an inefficient algorithm to construct the graph. stinger_insert_edge
is O(N), where N is the degree of the inserted edge's source vertex. The edge list is scanned each time to check for duplicates, but we don't need this functionality for a bulk insert.
This bit of code should be changed to use stinger_batch_insert_edges
or stinger_set_initial_edges
: https://github.com/stingergraph/stinger/blob/4561008338627e08e296d9634a1f8d2a0facca34/lib/stinger_utils/src/dimacs_support.c#L211-L214
The server was recently upgraded to use the new functions (see https://github.com/stingergraph/stinger/commit/b42f752276f6f10a137622141d901c6d67038a3c), but all the graph loaders need to be upgraded too.
Ahh, that makes sense. Thanks for the advice! I will update the loader code to use the bulk-update and report back.
(As you might have guessed running it sequentially was very slow and did not finish.)
If you do get it working, please contribute a pull request.
Is there a simple way to change STINGER to compile with g++ (c++11)? I included
#include "stinger_core/stinger_batch_insert.h"
This file uses c++11 features, and the makefile seems to default to compile using C.
Also I'd be happy to add a pull-request once I figure this out.
Thanks again for all of the help, Laxman
@ldhulipala Change dimacs_support.c
to .cc
, also in the CMakeLists.txt.
I gotstinger_set_initial_edges
to work eventually. There are weird interactions with this function and stinger_consistency_check
, however. I ended up having to comment out the consistency check (I think the underlying problem is that the stinger_set_initial_edges call only creates an out-edge for each edge, but I did not dig to verify this).
Running a BFS/BC after loading the graph works, however, and produces correct output, so I think the loading is working correctly. I will see if I can clean this up and send a pull request over the weekend if it's in a good state.
Getting the directions right is hard, that function was written before edges were stored in both directions. I have a working implementation here that starts with an edge list: https://github.com/DynoGraph/stinger-dynograph/blob/master/stinger_graph.cpp#L149
As long your algorithms never try to iterate over the in-edges, you can probably ignore the consistency check failure.
Hi all,
I was trying to create a large graph (twitter, ~41M vertices and ~2B undirected edges), but ran into some trouble. The input file is stored in the DIMACS format. I'm using the dimacs loader which I specify as follows:
env STINGER_MAX_MEMSIZE=300G ./bin/stinger_server -i twitter_sym.gr -t d
After adding print statements, I noticed that the code is stalling in the parallel loop that calls insert_edge(..) over all edges read from the input file. A graph about 1/10th the size could be loaded in about 2 mins, so I would expect the load to take about 20-30 mins, but even after letting it run for several hours it still seems to be stuck in this loop. Twitter has a pretty skewed degree distribution, so many threads could be trying to insert into the same vertex's list simultaneously. Could it be that this is causing the insertion code to deadlock somehow?
I'm currently running the loop without the parallel-for to see if it will complete successfully when run sequentially. That would hint that there is some deadlock issue with insert_edge(..) when loading this graph. I will report back with the result tomorrow.
Thanks, Laxman