stingergraph / StingerGraphs.jl

Julialang bindings to the STINGER graph database
http://www.stingergraph.com
Other
5 stars 3 forks source link

STINGER graph creation is slow #26

Open ehein6 opened 7 years ago

ehein6 commented 7 years ago

When StingerWrapper loads a STINGER graph, it calls insert_edge! in a loop. This can be very slow, especially for high-degree vertices. Constructing the graph all at once can be done more quickly using stinger_set_initial_edges(). An example implementation of generating the array parameters for this function from an edge list is here: https://github.com/DynoGraph/stinger-dynograph/blob/1c7f8295d4b7d514a4074f0fe7c3d177421099bf/stinger_graph.cpp#L142

ehein6 commented 7 years ago

@rohitvarkey Take a look at https://github.com/DynoGraph/stinger-dynograph/tree/expose-stinger-init. You should be able to include dynograph_stinger_utils.h, link against stinger_graph.o, and call dynograph_init_stinger_from_edge_list from Julia. Let me know if this doesn't work.

rohitvarkey commented 7 years ago

Awesome! Thanks @ehein6. I'll try it out and let you know!

jpfairbanks commented 6 years ago

Did this get fixed @rohitvarkey?

rohitvarkey commented 6 years ago

We have a workaround for this on the https://github.com/stingergraph/StingerGraphs.jl/commits/set_initial_edges branch. But this depends on @ehein6 's version of STINGER - https://github.com/DynoGraph/stinger-dynograph to be the linked library. We currently distribute the original STINGER library instead of stinger-dynograph.

ehein6 commented 6 years ago

The right way to fix this would be to file an issue against mainline STINGER to provide a more friendly interface to set_initial_edges (probably using the code I wrote) that could then be called from Julia.

Separately, using stinger_batch_insert functions would also speed up loading large batches.

jpfairbanks commented 6 years ago

@ehein6 what do you think about including a simpler api to set_initial_edges on mainline stinger? Would people use it?

ehein6 commented 6 years ago

It seems like a lot of new stinger users aren't looking to do large scale streaming analytics right away, they just want to load up a graph and run an algorithm. So I think it would be useful, especially in a Julia REPL.

jpfairbanks commented 6 years ago

Yeah, I think that is a more common use case. Streaming analytics is so much harder to get started with, and most people will run an SQL or Mongo Query to get the data into a CSV or json file before building all the infrastructure necessary for streaming. Lets get the necessary patch upstreamed.