Few Clarifications - Githubissues

abasak24 commented 6 years ago

Hi, I had a few questions regarding some parts of the code. I would appreciate it if you could clarifying my understanding!

1) stinger_insert_edge() versus stinger_incr_edge() -> New batches of edges fed into the server are handled by the process_batch() function in batch.pp. I notice that process_batch() calls only stinger_incr_edge. What is the difference between stinger_insert_edge and stinger_incr_edge? Does stinger_incr_edge make sure that a new edge being read from my CSV file is actually inserted into the stinger data structure? I am struggling a bit to understand this part.

2) Stinger edge block factor and edgeblocksize -> I would like to confirm my understanding regarding this issue. When launching the server, the stinger data structure is allocated/initiated with the CMAKE parameters where currently I have edgeBlockSize=14 and edgeblockfactor = 4 and also a number for vertices. However, during the runtime, if the 4 edge blocks per vertex per edge type are not sufficient, stinger will allocate a new edge block for the vertex when inserting a new edge of that type. Could you kindly confirm if this understanding is correct?

3) Edge deletions -> Currently, I am streaming in edges with CSV by increasing timestamps. Could you provide some suggestions as to how the CSV file could be modified to simulate the scenario where I delete some edges?

4) Computation pattern -> From observing the code, my understanding is that when the server dequeues a new batch of edges and updates the stinger data structure, the algorithm (PageRank, Static_components, kcore, etc.) now runs from scratch on the ENTIRE new graph (old graph + newly added edge batch). Could you kindly confirm if this understanding is correct?

Thank you!

ehein6 commented 6 years ago

There's only a difference if the edge you want to insert is already in the graph. stinger_incr_edge will add the two weights together, stinger_insert_edge will overwrite the weight with the new one.
You're almost right. Initially there is a pool of edge blocks that can be assigned to any vertex. The number of blocks in the pool is (number of vertices) * (edgeblockfactor). At run time, blocks are claimed by any vertex as needed during edge insertion. So hypothetically all edge blocks could end up being assigned to the same vertex at runtime.
This is a research question. You could use a sliding window like I do in DynoGraph, deleting edges with a timestamp older than a given threshold.
The algorithm is given a list of inserts and deletes that happened since the last batch. It's up to the algorithm to decide whether to do an incremental update based on this information or run from scratch on the entire graph.

abasak24 commented 6 years ago

thanks!

stingergraph / stinger

Few Clarifications #250